Solr Search fails

2011-05-06 Thread deniz
Hi all. have been trying to implement a universal search on a field but
somehow it fails...

when I make a full import everything is ok I can see the indexed field. But
when i make a query like 

universal:Male

it shows no match

any ideas?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907093.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible to use sub-fields or multivalued fields for boosting?

2011-05-06 Thread deniz
it seems like I will use dismax... I have tried some other ways but dismax
seems the best :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-sub-fields-or-multivalued-fields-for-boosting-tp2901992p2907094.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Search fails

2011-05-06 Thread Grijesh
What is your field type and analysis chain

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907097.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Search fails

2011-05-06 Thread deniz
type is string and i use standard analyzer ( i am not sure what you mean by
the word chain )

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907104.html
Sent from the Solr - User mailing list archive at Nabble.com.


Use Solr / Lucene to search in a Logfile

2011-05-06 Thread Robert Naczinski
Hello,

we want to search large log4j logfiles withSolr / Lucene and find any
lines with a special argument. ( for examle: with a defined userid )
It should be a solr indput document for each row in the file. The log
file is growing continuously and the search index must be refreshed.

Has already anybidy implemented something like this and can give me a tip ?

Greetings,

Robert


How to convert date/timestamp to long in data-config.xml

2011-05-06 Thread Shanmugavel SRD
SOLR : 1.4.1
There are 1,300,000+ documents in the index. Sorting on a date field with
timestamp leads to OutOfMemoryError. So, we are looking for a way to copy
the timestamp as a long value to a field and sort based on that field. Can
any one help me on how to convert the timestamp to a long value in
data-config.xml? Is there any existing transformer?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-convert-date-timestamp-to-long-in-data-config-xml-tp2907125p2907125.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Search fails

2011-05-06 Thread Grijesh
If its type is string then you can search for exact text only not for any
part of string and also case sensitive

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907148.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Search fails

2011-05-06 Thread deniz
well i have already done.. both the exact text and also some word in the
whole text... nothing changes...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907157.html
Sent from the Solr - User mailing list archive at Nabble.com.


How can i use Solr based Search Engine for My University?

2011-05-06 Thread Anurag
I am a student at  http://jmi.ac.in/index.htm Jamia Millia Islamia  , a
central univeristy in India. I want to use my search engine for the benefit
of students. The university has course like undergraduate,graduate,phd etc
inlcuding Engineering . Earlier one of my teacher suggested developing
Intranet Search ( for Lan) , but i am not able to figure it out as to how to
implement it. My university uses Google as its own site search tool. 

I am in Engg department and i see students( including me ) using Xerox,
Previous year papers , Notes  etc  during exam  time. People use internet or
say google to learn if any topics is not inlucded in book. 

Please give some valuable suggestions.

Thanks

-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can i use Solr based Search Engine for My University?

2011-05-06 Thread Paul Libbrecht
Have you looked at Nutch?
Or any other web-harvester?
That seems to be closest.
paul


Le 6 mai 2011 à 10:01, Anurag a écrit :

 I am a student at  http://jmi.ac.in/index.htm Jamia Millia Islamia  , a
 central univeristy in India. I want to use my search engine for the benefit
 of students. The university has course like undergraduate,graduate,phd etc
 inlcuding Engineering . Earlier one of my teacher suggested developing
 Intranet Search ( for Lan) , but i am not able to figure it out as to how to
 implement it. My university uses Google as its own site search tool. 
 
 I am in Engg department and i see students( including me ) using Xerox,
 Previous year papers , Notes  etc  during exam  time. People use internet or
 say google to learn if any topics is not inlucded in book. 
 
 Please give some valuable suggestions.
 
 Thanks
 
 -
 Kumar Anurag
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Search fails

2011-05-06 Thread Grijesh
provide your schema for more detail about your problem

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907192.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can i use Solr based Search Engine for My University?

2011-05-06 Thread Grijesh
Use Nutch for your Intranet crawling.For more detail 
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907200.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to convert date/timestamp to long in data-config.xml

2011-05-06 Thread Grijesh
HI,
You can convert time stamp to long by writing a custom transformer. But how
will it help for OutOfMemory error.Because any sorting will use lucene field
cache which will take a lot of memory as you have huge data.
If you can then buy more RAM for your server.

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-convert-date-timestamp-to-long-in-data-config-xml-tp2907125p2907229.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can i use Solr based Search Engine for My University?

2011-05-06 Thread Anurag
In my search engine Nutch and Solr have been integrated both. Also i am
impplemented autocraling process. Whenever any one puts a http link in a
given box and then submit it, the http site address gets automatically
crawled and Indexes to solr...



On Fri, May 6, 2011 at 2:02 PM, Grijesh [via Lucene] 
ml-node+2907200-1529372386-146...@n3.nabble.com wrote:

 Use Nutch for your Intranet crawling.For more detail
 http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
 Thanx:
 Grijesh
 www.gettinhahead.co.in http://www.gettingahead.co.in


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907200.html
  To unsubscribe from How can i use Solr based Search Engine for My
 University?, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2907168code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXwyOTA3MTY4fC0yMDk4MzQ0MTk2.





-- 
Kumar Anurag


-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907483.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-06 Thread Ahmet Arslan


--- On Fri, 5/6/11, Rohit ro...@in-rev.com wrote:

 From: Rohit ro...@in-rev.com
 Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:
 To: solr-user@lucene.apache.org
 Date: Friday, May 6, 2011, 8:47 AM
 Hi Craig,
 
 Thanks for the response, actually what we need to achive is
 see group by
 results based on dates like,
 
 2011-01-01  23
 2011-01-02  14
 2011-01-03  40
 2011-01-04  10
 
 Now the records in my table run into millions, grouping the
 result based on
 UTC date would not produce the right result since the
 result should be
 grouped on users timezone.  Is there anyway we can
 achieve this in Solr?

Easiest way can be create additional string typed field, and use copyField to 
populate it. (copy first 10 characters from (t)date into string)

And facet on that string field.  facet=onfacet.field=SDATE

field name=DATE type=tdate indexed=true stored=true/
field name=SDATE type=string indexed=true stored=true/

copyField source=DATE dest=SDATE maxChars=10/


uima fieldMappings and solr dynamicField

2011-05-06 Thread Koji Sekiguchi
Hello,

I'd like to use dynamicField in feature-field mapping of uima update
processor. It doesn't seem to be acceptable currently. Is it a bad idea
in terms of use of uima? If it is not so bad, I'd like to try a patch.

Background:

Because my uima annotator can generate many types of named entity from
a text, I don't want to implement so many types, but one type NamedEntity:

typeSystemDescription
  types
typeDescription
  namecom.rondhuit.uima.next.NamedEntity/name
  description/
  supertypeNameuima.tcas.Annotation/supertypeName
  features
featureDescription
  namename/name
  description/
  rangeTypeNameuima.cas.String/rangeTypeName
/featureDescription
featureDescription
  nameentity/name
  description/
  rangeTypeNameuima.cas.String/rangeTypeName
/featureDescription
  /features
/typeDescription
  /types
/typeSystemDescription

sample extracted named entities:

name=PERSON, entity=Barack Obama
name=TITLE, entity=the President

Now, I'd like to map these named entities to Solr fields like this:

PERSON_S:Barack Obama
TITLE_S:the President

Because the type of name (PERSON, TITLE, etc.) can be so many,
I'd like to use dynamicField *_s. And where * is replaced by the name
feature of NamedEntity.

I think this is natural requirement from Solr view point, but I'm
not sure my uima annotator implementation is correct or not. In other
words, should I implement many types for each entity types?
(e.g. PersonEntity, TitleEntity, ... instead of NamedEntity)

Thank you!

Koji
-- 
http://www.rondhuit.com/en/


Re: Thoughts on Search Analytics?

2011-05-06 Thread findbestopensource
1. Reports based on Location. Group by City / Country
2. Total search performed per hour / week / month
3. Frequently used search keywords
4. Analytics based on search keywords.

Regards
Aditya
www.findbestopensource.com


On Fri, May 6, 2011 at 3:55 AM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Hi,

 I'd like to solicit your thoughts about Search Analytics if you are  doing
 any
 sort of analysis/reporting of search logs or click stream or  anything
 related.

 * Which information or reports do you find the most useful and why?
 * Which reports would you like to have, but don't have for whatever  reason
 (don't have the needed data, or it's too hard to produce such  reports, or
 ...)
 * Which tool(s) or service(s) do you use and find the most useful?

 I'm preparing a presentation on the topic of Search Analytics, so I'm
  trying to

 solicit opinions, practices, desires, etc. on this topic.

 Your thoughts would be greatly appreciated.  If you could reply  directly,
 that
 would be great, since this may be a bit OT for the list.

 Thanks!
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



Re: How can i use Solr based Search Engine for My University?

2011-05-06 Thread findbestopensource
Hello Anurag

Google is always there to do internet search. You need to support search for
your university. My opinion would be don't crawl the sites. You require only
Solr and not Nutch.

1. Provide an interface to upload the documents by the university students.
The documents could be previous year question paper, Notes, E-books etc.
Scan the documents and convert it to PDF and upload them. Providing search
on these things would be more valuable than crawling the sites.

Regards
Aditya
www.findbestopensource.com



On Fri, May 6, 2011 at 1:31 PM, Anurag anurag.it.jo...@gmail.com wrote:

 I am a student at  http://jmi.ac.in/index.htm Jamia Millia Islamia  , a
 central univeristy in India. I want to use my search engine for the benefit
 of students. The university has course like undergraduate,graduate,phd etc
 inlcuding Engineering . Earlier one of my teacher suggested developing
 Intranet Search ( for Lan) , but i am not able to figure it out as to how
 to
 implement it. My university uses Google as its own site search tool.

 I am in Engg department and i see students( including me ) using Xerox,
 Previous year papers , Notes  etc  during exam  time. People use internet
 or
 say google to learn if any topics is not inlucded in book.

 Please give some valuable suggestions.

 Thanks

 -
 Kumar Anurag

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: UIMA analysisEngine path

2011-05-06 Thread Tommaso Teofili
Barry, I understand your need and I agree with you it'd be useful to be able
to load AEs also from filesystem, I created SOLR-2501 [1] to track that
requirement.
Consider that loading AEs from relative paths, as using relative paths in
general, is not a good practice since different environments could set the
relative path start at different points in the filesystem; I think a good
solution would be using the solr.home as the root of a relative path because
that is a Solr instance/core property.
Regards,
Tommaso

[1] : https://issues.apache.org/jira/browse/SOLR-2501

2011/5/5 Barry Hathaway bhath...@nycap.rr.com

 Tommaso,

 Thanks. Now Solr finds the descriptor; however, I think this is very bad
 practice.
 Descriptors really aren't meant to be jarred up. They often contain
 relative paths.
 For example, in my case I have a directory that looks like:
appassemble
|- desc
|- pear

 where the AnalysisEngine descriptor contained in desc is an aggregate
 analysis engine and
 refers to other analysis engines packaged as installed PEAR files in the
 pear subdirectory.
 As such, the descriptor contains relative paths pointing into the pear
 subdirectory.
 Grabbing the descriptor from the jar breaks that since
 OverridingParamsAEProvider
 uses the XMLInputSource method without relative path signature.

 Barry


 On 5/4/2011 6:16 AM, Tommaso Teofili wrote:

 Hello Barry,
 the main AnalysisEngine descriptor defined inside theanalysisEngine
 element should be inside one of the jars imported with thelib  elements.
 At the moment it cannot be taken from expanded directories but it should
 be
 easy to do it (and indeed useful) modifying the
 OverridingParamsAEProvider class
 [1] at line 57.
 Hope this helps,
 Tommaso

 [1] :

 http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/src/main/java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java?view=markup

 2011/5/3 Barry Hathawaybhath...@nycap.rr.com

  I'm new to Solr and trying to get it call a UIMA aggregate analysis
 engine
 and not having much luck.
 The null pointer exception indicates that it can't find the xml file
 associated with the engine.
 I have tried a number of combinations of a path in theanalysisEngine
  element, but nothing
 seems to work. In addition, I've put the directory containing the
 descriptor in both the classpath
 when starting the server and in alib  element in solrconfig.xml. So:

 What classpath does theanalysisEngine  tag effectively search for to
 locate the descriptor?

 Do thelib  entries in solrconfig.xml affect this classpath?

 Do the engine descriptors have to be in a jar or can they be in an
 expanded
 directory?

 Thanks in advance.

 Barry








Re: why query chinese character with bracket become phrase query by default?

2011-05-06 Thread Michael McCandless
On Thu, May 5, 2011 at 10:00 AM, Yonik Seeley
yo...@lucidimagination.com wrote:

 2011/5/5 Michael McCandless luc...@mikemccandless.com:
 The very first thing every non-whitespace language Solr app should do
 is turn  off autoGeneratePhraseQueries!

 Luckily, this is configurable per FieldType... so if it doesn't exist
 yet, we should come up with a good
 CJK fieldtype to add to the example schema.

+1

Shouldn't we  have field types in the eg schema for the different
languages?  Ie, text_zh, text_th, text_en, text_ja, text_nl, etc.

Mike

http://blog.mikemccandless.com


How many UpdateHandlers can a Solr config have?

2011-05-06 Thread Julian Heise
Hello everyone,

 

just a very basic question, but I haven't been able to find the answer in
the Solr wiki: how many updateHandlers can one Solr config have? Just one?
Or many?

 

Thank you very much

 

-Julian



Re: DIH for e-mails

2011-05-06 Thread Erick Erickson
Take a look at Transformers, perhaps a custom Transformer.

They're surprisingly easy to add. Essentially, if you write your own
it gets a map representing the Solr document. That map
contains all of the modifications to the document made by any
other Transformers previously defined, and you can freely add/remove
fields in the map. DIH will then pass the entire result off toSolr
to be indexed.

Best
Erick

2011/5/5 m _ 米蟲ы~ fangzhenp...@foxmail.com:
 I’m using Data Import Handler for index emails.


 The problem is that I wanna add my own field such as security_number.


 Someone have any idea?


 Regards,


 --

  James  Bond Fang


Re: Use Solr / Lucene to search in a Logfile

2011-05-06 Thread Otis Gospodnetic
Hi Robert,

Have you considered just using Loggly.com ?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Robert Naczinski robert.naczin...@googlemail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, May 6, 2011 3:01:45 AM
 Subject: Use Solr / Lucene to search in a Logfile
 
 Hello,
 
 we want to search large log4j logfiles withSolr / Lucene and find  any
 lines with a special argument. ( for examle: with a defined userid  )
 It should be a solr indput document for each row in the file. The  log
 file is growing continuously and the search index must be  refreshed.
 
 Has already anybidy implemented something like this and can  give me a tip ?
 
 Greetings,
 
 Robert
 


Re: Thoughts on Search Analytics?

2011-05-06 Thread Otis Gospodnetic

Hi Aditya,


- Original Message 

 From: findbestopensource findbestopensou...@gmail.com

 1. Reports based on Location. Group by City / Country

In other words, much like what one gets in Google Analytics?

 2. Total search  performed per hour / week / month
 3. Frequently used search keywords
 4.  Analytics based on search  keywords.

Could you please elaborate and be more specific about this last one?

Thanks!
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


 On  Fri, May 6, 2011 at 3:55 AM, Otis Gospodnetic otis_gospodne...@yahoo.com
   wrote:
 
  Hi,
 
  I'd like to solicit your thoughts about  Search Analytics if you are  doing
  any
  sort of  analysis/reporting of search logs or click stream or  anything
   related.
 
  * Which information or reports do you find the most  useful and why?
  * Which reports would you like to have, but don't have  for whatever  reason
  (don't have the needed data, or it's too hard  to produce such  reports, or
  ...)
  * Which tool(s) or  service(s) do you use and find the most useful?
 
  I'm preparing a  presentation on the topic of Search Analytics, so I'm
   trying  to
 
  solicit opinions, practices, desires, etc. on this  topic.
 
  Your thoughts would be greatly appreciated.  If you  could reply  directly,
  that
  would be great, since this may  be a bit OT for the list.
 
  Thanks!
  Otis
   
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Lucene ecosystem  search :: http://search-lucene.com/
 
 


Re: How can i use Solr based Search Engine for My University?

2011-05-06 Thread Anurag
Thanks Aditya, i appreciate your suggestion.i will implemet your
suggestions. Besides these is there any other useful aspect that i may be
not taking into account?
Thanks a lot..

On Fri, May 6, 2011 at 4:57 PM, findbestopensource [via Lucene] 
ml-node+2907727-211212-146...@n3.nabble.com wrote:

 Hello Anurag

 Google is always there to do internet search. You need to support search
 for
 your university. My opinion would be don't crawl the sites. You require
 only
 Solr and not Nutch.

 1. Provide an interface to upload the documents by the university students.

 The documents could be previous year question paper, Notes, E-books etc.
 Scan the documents and convert it to PDF and upload them. Providing search
 on these things would be more valuable than crawling the sites.

 Regards
 Aditya
 www.findbestopensource.com



 On Fri, May 6, 2011 at 1:31 PM, Anurag [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=2907727i=0by-user=t
 wrote:

  I am a student at  http://jmi.ac.in/index.htm Jamia Millia Islamia  , a
  central univeristy in India. I want to use my search engine for the
 benefit
  of students. The university has course like undergraduate,graduate,phd
 etc
  inlcuding Engineering . Earlier one of my teacher suggested developing
  Intranet Search ( for Lan) , but i am not able to figure it out as to how

  to
  implement it. My university uses Google as its own site search tool.
 
  I am in Engg department and i see students( including me ) using Xerox,
  Previous year papers , Notes  etc  during exam  time. People use internet

  or
  say google to learn if any topics is not inlucded in book.
 
  Please give some valuable suggestions.
 
  Thanks
 
  -
  Kumar Anurag
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.htmlhttp://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.html?by-user=t
  Sent from the Solr - User mailing list archive at Nabble.com.
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907727.html
  To unsubscribe from How can i use Solr based Search Engine for My
 University?, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2907168code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXwyOTA3MTY4fC0yMDk4MzQ0MTk2.





-- 
Kumar Anurag


-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2908076.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Terms and Date field issues

2011-05-06 Thread Erick Erickson
OK, I'm reaching a little here, but I think it's got a pretty good chance
of being the issue you're seeing. Sure hope somebody jumps
in and corrects me if  I'm wrong (hint hint)...

I haven't delved into the actual Trie code, this is just from looking
with TermsComponent and Luke. Using Solr 1.4.1 BTW.

What you're seeing it a consequence of the trie field type with a
precision step other than 0. Trie fields with precisionstep  0 add
extra stuff to the index to allow more efficient range queries. A hint
about this is that your 5 documents with the tdate type produce
16 tokens rather than just 5.

If you try your experiment with the date type (which is a trie type with
precisionstep=0) you'll see exactly what you expect.

So the long and short of it is that Solr's working as expected, and
you can use your index without worrying. But, if you're trying to do
some lower-level term walking, you'll either have to filter stuff out,
copy your dates to something with precisionstep=0 and use that
field or

Best
Erick

On Thu, May 5, 2011 at 9:08 PM, Ahmet Arslan iori...@yahoo.com wrote:


 It is okey to see weird things in admin/schema.jsp or terms component with 
 trie based types. Please see http://search-lucene.com/m/WEfSI1Yi4562/

 If you really need terms component, consider using copyField (tdate to string 
 type)




 Please find attached the schema and some test data (test.xml).

 Thanks for looking this.
 Viswa


 Date: Thu, 5 May 2011 19:08:31 -0400
 Subject: Re: Solr Terms and Date field issues
 From: erickerick...@gmail.com
 To: solr-user@lucene.apache.org

 H, this is puzzling. If you could come up with a couple of xml
 files and a schema
 that illustrate this, I'll see what I can see...

 Thanks,
 Erick

 On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote:
 
  Erik,
 
  I suspected the same, and setup a test instance to reproduce this. The 
  date field I used is setup to capture indexing time, in other words the 
  schema has a default value of NOW. However, I have reproduced this issue 
  with fields which do no have defaults too.
 
  On the second one, I did a delete-commit (with expungeDeletes=true) and 
  then a optimize. All other fields show updated terms except the date 
  fields. I have also double checked to see if the Luke handler has any 
  different terms, and it did not.
 
 
  Thanks
  Viswa
 
 
  Date: Wed, 4 May 2011 08:17:39 -0400
  Subject: Re: Solr Terms and Date field issues
  From: erickerick...@gmail.com
  To: solr-user@lucene.apache.org
 
  Hmmm, this *looks* like you've changed your schema without
  re-indexing all your data so you're getting old (string?) values in
  that field, but that's just a guess. If this is really happening on a
  clean index it's a problem.
 
  I'm also going to guess that you're not really deleting the documents
  you think. Are you committing after the deletes?
 
  Best
  Erick
 
  On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote:
  
   Hello,
  
   The terms query for a date field seems to get populated with some weird 
   dates, many of these dates (1970,2009,2011-04-23) are not present in 
   the indexed data.  Please see sample data below
  
   I also notice that a delete and optimize does not remove the relevant 
   terms for date fields, the string fields seems work fine.
  
   Thanks
   Viswa
  
   Results from Terms component:
  
  
   int name=2011-05-04T02:01:32.928Z3479/int
  
   int name=2011-05-04T02:00:19.2Z3479/int
  
   int name=2011-05-03T22:34:58.432Z3479/int
  
   int name=2011-04-23T01:36:14.336Z3479/int
  
   int name=2009-03-13T13:23:01.248Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=2011-05-04T02:01:34.592Z265/int
  
  
   Result from facet component, rounded by seconds.:
  
   lst name=InsertTime
   int name=2011-05-04T02:01:32Z1/int
  
   int name=2011-05-04T02:01:33Z1148/int
  
   int name=2011-05-04T02:01:34Z2333/int
  
   str name=gap+1SECOND/str
  
   date name=start2011-05-03T06:14:14Z/date
  
   date name=end2011-05-04T06:14:14Z/date/lst
  
 





Re: Testing the limits of non-Java Solr

2011-05-06 Thread Erick Erickson
You've hit it right on the head... if you can use the standard
analyzers/filters/etc, you're in good shape.

You have to process the output (xml, json, whatever) as Otis
says, but that's in whatever language your app server uses.

But when was the last time you were motivated to write a blog
post like just used the package and it all worked :). Perhaps
one of the things you're seeing is that people are motivated
to write about the nifty parts of what they do... Coupled with
the fact that people write to the users' list exactly because they
can't make the standard stuff do their particular task.

It's nice to know you *can* extend it with plugins for those gnarly
situations though.

So I say go for it!

Best
Erick


On Thu, May 5, 2011 at 6:28 PM, Jack Repenning jrepenn...@collab.net wrote:
 What's the probability that I can build a non-trivial Solr app without 
 writing any Java?

 I've been planning to use Solr, Lucene, and existing plug-ins, and sort of 
 hoping not to write any Java (the app itself is Ruby / Rails). The dox (such 
 as http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, 
 but my planning's all been no Java.]

 I'm just beginning the design work in earnest, and I suddenly notice that it 
 seems every mail thread, blog, or example starts out Java-free, but somehow 
 ends up involving Java code. I'm not sure I yet understand all these 
 snippets; conceivably some of the Java I see could just as easily be written 
 in another language, but it makes me wonder. Is it realistic to plan a 
 sizable Solr application without some Java programming?

 I know, I know, I know: everything depends on the details. I'd be interested 
 even in anecdotes: has anyone ever achieved this before? Also, what are the 
 clues I should look for that I need to step into the Java realm? I 
 understand, for example, that it's possible to write filters and tokenizers 
 to do stuff not available in any standard one; in this case, the clue would 
 be I can't find what I want in the standard list, I guess. Are there other 
 things I should look for?

 -==-
 Jack Repenning
 Technologist
 Codesion Business Unit
 CollabNet, Inc.
 8000 Marina Boulevard, Suite 600
 Brisbane, California 94005
 office: +1 650.228.2562
 twitter: http://twitter.com/jrep












Re: UIMA analysisEngine path

2011-05-06 Thread Barry Hathaway

Thanks for creating the case to track the requirement.
I really don't agree with your comments about using relative paths though.
The only way to specify the AE's making up an aggregate AE is to use a 
import location ...,
leaving you to choose either a absolute, relative, or a URL.  All of 
these are not that great.
You are not allowed to use environment variables. The UIMA documentation 
clearly states
that relative paths are relative with respect to the location of the 
descriptor containing the import.
That is the way in which XMLInputSource works.  Solr's 
OverridingParamsAEProvider, in my opinion,
is clearly broken. If it wants to suck a descriptor out of a jar then it 
MUST call XMLInputSource using
the signature in with both the descriptor name AND the path to the jar 
containing  are passed in so

that XMLInputSource knows how to process the descriptor.

Barry

On 5/6/2011 8:47 AM, Tommaso Teofili wrote:

Barry, I understand your need and I agree with you it'd be useful to be able
to load AEs also from filesystem, I created SOLR-2501 [1] to track that
requirement.
Consider that loading AEs from relative paths, as using relative paths in
general, is not a good practice since different environments could set the
relative path start at different points in the filesystem; I think a good
solution would be using the solr.home as the root of a relative path because
that is a Solr instance/core property.
Regards,
Tommaso

[1] : https://issues.apache.org/jira/browse/SOLR-2501

2011/5/5 Barry Hathawaybhath...@nycap.rr.com


Tommaso,

Thanks. Now Solr finds the descriptor; however, I think this is very bad
practice.
Descriptors really aren't meant to be jarred up. They often contain
relative paths.
For example, in my case I have a directory that looks like:
appassemble
|- desc
|- pear

where the AnalysisEngine descriptor contained in desc is an aggregate
analysis engine and
refers to other analysis engines packaged as installed PEAR files in the
pear subdirectory.
As such, the descriptor contains relative paths pointing into the pear
subdirectory.
Grabbing the descriptor from the jar breaks that since
OverridingParamsAEProvider
uses the XMLInputSource method without relative path signature.

Barry


On 5/4/2011 6:16 AM, Tommaso Teofili wrote:


Hello Barry,
the main AnalysisEngine descriptor defined inside theanalysisEngine
element should be inside one of the jars imported with thelib   elements.
At the moment it cannot be taken from expanded directories but it should
be
easy to do it (and indeed useful) modifying the
OverridingParamsAEProvider class
[1] at line 57.
Hope this helps,
Tommaso

[1] :

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/src/main/java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java?view=markup

2011/5/3 Barry Hathawaybhath...@nycap.rr.com

  I'm new to Solr and trying to get it call a UIMA aggregate analysis

engine
and not having much luck.
The null pointer exception indicates that it can't find the xml file
associated with the engine.
I have tried a number of combinations of a path in theanalysisEngine
  element, but nothing
seems to work. In addition, I've put the directory containing the
descriptor in both the classpath
when starting the server and in alib   element in solrconfig.xml. So:

What classpath does theanalysisEngine   tag effectively search for to
locate the descriptor?

Do thelib   entries in solrconfig.xml affect this classpath?

Do the engine descriptors have to be in a jar or can they be in an
expanded
directory?

Thanks in advance.

Barry









Re: UIMA analysisEngine path

2011-05-06 Thread Tommaso Teofili
Hello Barry,

2011/5/6 Barry Hathaway bhath...@nycap.rr.com

 Thanks for creating the case to track the requirement.
 I really don't agree with your comments about using relative paths though.
 The only way to specify the AE's making up an aggregate AE is to use a
 import location ...,
 leaving you to choose either a absolute, relative, or a URL.


this is not true, you can do also import name=... which handles
classpaths and datapaths, have a look here:
http://uima.apache.org/d/uimaj-2.3.1/references.html#ugr.ref.xml.component_descriptor.imports


  All of these are not that great.
 You are not allowed to use environment variables. The UIMA documentation
 clearly states
 that relative paths are relative with respect to the location of the
 descriptor containing the import.


I know that, I meant the relative path to retrieve the main aggregate AE
from Solr not the relative path used in import location=.. to get the
delegate AEs from the aggregate AE.
I am not proposing to introduce environment variables, I am just saying that
if we want to support relative paths then I think it'd be a nice idea to
choose where the relative file URL starts.



 That is the way in which XMLInputSource works.  Solr's
 OverridingParamsAEProvider, in my opinion,
 is clearly broken. If it wants to suck a descriptor out of a jar then it
 MUST call XMLInputSource using
 the signature in with both the descriptor name AND the path to the jar
 containing  are passed in so
 that XMLInputSource knows how to process the descriptor.


The XMLInputSource offers a URL based constructor which is useful to serve
both scenarios [1].
I am ok on supporting also filesystem retrieved descriptors; this was not
taken in account in the first implementation since many existing annotators
already deliver descriptors bundled inside the jars/pears but this addition
sounds like a good improvement so, basically, let's do it ;-)
Regards,
Tommaso

[1] :
http://uima.apache.org/d/uimaj-2.3.1/api/org/apache/uima/util/XMLInputSource.html#XMLInputSource(java.net.URL)



 Barry


 On 5/6/2011 8:47 AM, Tommaso Teofili wrote:

 Barry, I understand your need and I agree with you it'd be useful to be
 able
 to load AEs also from filesystem, I created SOLR-2501 [1] to track that
 requirement.
 Consider that loading AEs from relative paths, as using relative paths in
 general, is not a good practice since different environments could set the
 relative path start at different points in the filesystem; I think a
 good
 solution would be using the solr.home as the root of a relative path
 because
 that is a Solr instance/core property.
 Regards,
 Tommaso

 [1] : https://issues.apache.org/jira/browse/SOLR-2501

 2011/5/5 Barry Hathawaybhath...@nycap.rr.com

  Tommaso,

 Thanks. Now Solr finds the descriptor; however, I think this is very bad
 practice.
 Descriptors really aren't meant to be jarred up. They often contain
 relative paths.
 For example, in my case I have a directory that looks like:
appassemble
|- desc
|- pear

 where the AnalysisEngine descriptor contained in desc is an aggregate
 analysis engine and
 refers to other analysis engines packaged as installed PEAR files in the
 pear subdirectory.
 As such, the descriptor contains relative paths pointing into the pear
 subdirectory.
 Grabbing the descriptor from the jar breaks that since
 OverridingParamsAEProvider
 uses the XMLInputSource method without relative path signature.

 Barry


 On 5/4/2011 6:16 AM, Tommaso Teofili wrote:

  Hello Barry,
 the main AnalysisEngine descriptor defined inside theanalysisEngine
 element should be inside one of the jars imported with thelib
 elements.
 At the moment it cannot be taken from expanded directories but it should
 be
 easy to do it (and indeed useful) modifying the
 OverridingParamsAEProvider class
 [1] at line 57.
 Hope this helps,
 Tommaso

 [1] :


 http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/src/main/java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java?view=markup

 2011/5/3 Barry Hathawaybhath...@nycap.rr.com

  I'm new to Solr and trying to get it call a UIMA aggregate analysis

 engine
 and not having much luck.
 The null pointer exception indicates that it can't find the xml file
 associated with the engine.
 I have tried a number of combinations of a path in theanalysisEngine
  element, but nothing
 seems to work. In addition, I've put the directory containing the
 descriptor in both the classpath
 when starting the server and in alib   element in solrconfig.xml. So:

 What classpath does theanalysisEngine   tag effectively search for
 to
 locate the descriptor?

 Do thelib   entries in solrconfig.xml affect this classpath?

 Do the engine descriptors have to be in a jar or can they be in an
 expanded
 directory?

 Thanks in advance.

 Barry









Michigan Information Retrieval Enthusiasts Group Quarterly Meetup - May 19th 2011

2011-05-06 Thread Provalov, Ivan
Our next IR Meetup is at Cengage Learning on May 19, 2011.  Please RSVP here:
http://www.meetup.com/Michigan-Information-Retrieval-Enthusiasts-Group/events/17567795/

Presentations:
1. Bayesian Language Model
This talk presents a Bayesian language model, originally described by (Teh 
2006), which uses a hierarchical Pitman-Yor process to describe the 
distribution of n-grams in an n-gram language model and which allows for a 
Bayesian back-off and smoothing strategy. The language model, which assumes a 
power-law prior over the n-gram space, compares favorably with language models 
based upon state of the art empirical n-gram smoothing techniques. In addition 
to the language model, and primarily because the background information 
required to understand it is somewhat difficult, that  material, most of which 
does not appear in (Teh 2006), is also presented in some detail. In particular, 
background information related to the Dirichlet distribution and the Dirichlet 
process is given. The Dirichlet process is then related to the Pitman-Yor 
process, and the hierarchical Pitman-Yor process is also presented.

2. Using GATE for Word Polarity in Context Classification
GATE (General Architecture for Text Engineering) is an open source software for 
creating text processing workflows.  Core GATE includes the tools for solving 
many text engineering issues: modeling and persistence of specialized data 
structures; measurement, evaluation, benchmarking; visualization and editing of 
annotations, ontologies, parse trees, etc.; extraction of training instances 
for machine learning; pluggable machine learning implementations.  This 
tutorial will show how to use GATE for advanced machine learning applications.  
Detecting word polarity in context will be used as an example to show some of 
the GATE features.  The tutorial project is based on the latest sentiment 
analysis research, specifically the work by Theresa Wilson, Janyce Wiebe, Paul 
Hoffmann Recognizing Contextual Polarity: An Exploration of Features for 
Phrase-Level Sentiment Analysis, 2009.  Using different features (words, part 
of speech, negations, etc...) SVM classifier is trained and evaluated.

Thank you,

Ivan Provalov


RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-06 Thread Rohit
Thanks Ahmet, let me give this a shot.

Regards,
Rohit



-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: 06 May 2011 15:39
To: solr-user@lucene.apache.org
Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date
String:



--- On Fri, 5/6/11, Rohit ro...@in-rev.com wrote:

 From: Rohit ro...@in-rev.com
 Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date
String:
 To: solr-user@lucene.apache.org
 Date: Friday, May 6, 2011, 8:47 AM
 Hi Craig,
 
 Thanks for the response, actually what we need to achive is
 see group by
 results based on dates like,
 
 2011-01-01  23
 2011-01-02  14
 2011-01-03  40
 2011-01-04  10
 
 Now the records in my table run into millions, grouping the
 result based on
 UTC date would not produce the right result since the
 result should be
 grouped on users timezone.  Is there anyway we can
 achieve this in Solr?

Easiest way can be create additional string typed field, and use copyField
to populate it. (copy first 10 characters from (t)date into string)

And facet on that string field.  facet=onfacet.field=SDATE

field name=DATE type=tdate indexed=true stored=true/
field name=SDATE type=string indexed=true stored=true/

copyField source=DATE dest=SDATE maxChars=10/



Replication question

2011-05-06 Thread kenf_nc
I have Replication set up with
  str name=pollInterval00:00:60/str 

I assumed that meant it would poll the master for updates once a minute. But
my logs make it look like it is trying to sync up almost constantly. Below
is an example of my log from just 1 minute in time. Am I reading this wrong?
This is from one of the slaves, I have 2 of them so my Master's log file is
double this.

Is this normal?

May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:34:14 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
May 6, 2011 1:35:05 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replication-question-tp2909157p2909157.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to pass resultset to stored procedure ? DataImportHandler

2011-05-06 Thread binoybt
Hi 

I am new to Solr. I wrote a stored procedure in Oracle. 

I tried calling from solr. But the procedure is not getting executed as it
expects a resultset as out param. 

CREATE OR REPLACE PROCEDURE GETSEARCHQUERY(p_cursor in out sys_refcursor) AS 
BEGIN 
OPEN p_cursor FOR 

select *   from X where X.id = 10730; 

END GETSEARCHQUERY; 

Dataconfig.xml 

  entity name=coreY transformer=TemplateTransformer pk=id 
  
 query={call GETSEARCHQUERY()} 
 deltaQuery={call GETSEARCHQUERY()}

/entity

Can someone help me. 

Thanks 
Binoy


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-pass-resultset-to-stored-procedure-DataImportHandler-tp2906902p2906902.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Programmatic restructuring of a Solr cloud

2011-05-06 Thread Sergey Sazonov

Hello Jan,

Thank you very much for the answer. Unfortunately, we don't use Amazon, 
and I doubt we will be able to persuade the customer to switch to it. 
Moreover, the amount of data will not allow us to store everything on a 
single master. However, having considered your design I am starting to 
see the problem in a new light, so maybe it will still prove helpful ;)


In the meanwhile, I'm still looking for other solutions...

Best regards,
Sergey Sazonov.

On 05/05/11 15:07, Jan Høydahl wrote:

Hi,

One approach if you're using Amazon is using BeanStalk

* Create one master with 12 cores, named jan, feb, mar etc
* Every month, you clear the current month index and switch indexing to it
   You will only have one master, because you're only indexing to one month at 
a time
* For each of the 12 months, setup an Amazon BeanStalk instance with a Solr 
replica pointing to its master
   This way, Amazon will spin off replicas as needed
   NOTE: Your replica could still be located at /solr/select even if it 
replicates from /solr/may/replication
* You only query the replicas, and the client will control whether to query one 
or more shards
   
shards=jan.elasticbeanstalk.com/solr,feb.elasticbeanstalk.com/solr,mar.elasticbeanstalk.com/solr

After this is setup, you have 0 config to worry about :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 14.03, Sergey Sazonov wrote:


Dear Solr Experts,

First of all, I would like to thank you for your patience when answering 
questions of those who are less experienced.

And now to the main topic: I would like to learn whether it is possible to 
restructure a Solr cloud programmatically.

Let me describe the system we are designing to make the requirements clear. The 
indexed documents are certain log entries. We are planning to shard them by 
month, and only keep the last 12 months in the index. We are going to replicate 
each shard across several servers.

Now, the user is always required to search within a single month (= shard). Most 
importantly, we expect an absolute majority of the requests to query the current month, 
with only a minor load on the previous months. In order to utilise the cluster most 
efficiently, we would like a majority of the servers to contain replicas of the current 
month data, and have only one or two servers per older month. To this end, we are 
planning to have a set of slaves that migrate from master to master, 
depending on which master holds the data for the current month. When a new month starts, 
those slaves have to be reconfigured to hold the new shard and to replicate from the new 
master (their old master now holding the data for the previous month).

Since this operation has to be done every month, we are naturally considering 
automating it. So my question is whether anyone has faced a similar problem 
before, and what is the best way to solve it. We are not committed to any 
solution, or even architecture, so feel free to propose different solutions. 
The only requirement is that a majority of the servers should be able to serve 
requests to the current month at any given moment.

Thank you in advance for your answers.

Best regards,
Sergey Sazonov.




RE: Solr Terms and Date field issues

2011-05-06 Thread Viswa S

Thanks Erick  Ahmet, that helps.

 Date: Fri, 6 May 2011 09:25:11 -0400
 Subject: Re: Solr Terms and Date field issues
 From: erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 
 OK, I'm reaching a little here, but I think it's got a pretty good chance
 of being the issue you're seeing. Sure hope somebody jumps
 in and corrects me if  I'm wrong (hint hint)...
 
 I haven't delved into the actual Trie code, this is just from looking
 with TermsComponent and Luke. Using Solr 1.4.1 BTW.
 
 What you're seeing it a consequence of the trie field type with a
 precision step other than 0. Trie fields with precisionstep  0 add
 extra stuff to the index to allow more efficient range queries. A hint
 about this is that your 5 documents with the tdate type produce
 16 tokens rather than just 5.
 
 If you try your experiment with the date type (which is a trie type with
 precisionstep=0) you'll see exactly what you expect.
 
 So the long and short of it is that Solr's working as expected, and
 you can use your index without worrying. But, if you're trying to do
 some lower-level term walking, you'll either have to filter stuff out,
 copy your dates to something with precisionstep=0 and use that
 field or
 
 Best
 Erick
 
 On Thu, May 5, 2011 at 9:08 PM, Ahmet Arslan iori...@yahoo.com wrote:
 
 
  It is okey to see weird things in admin/schema.jsp or terms component with 
  trie based types. Please see http://search-lucene.com/m/WEfSI1Yi4562/
 
  If you really need terms component, consider using copyField (tdate to 
  string type)
 
 
 
 
  Please find attached the schema and some test data (test.xml).
 
  Thanks for looking this.
  Viswa
 
 
  Date: Thu, 5 May 2011 19:08:31 -0400
  Subject: Re: Solr Terms and Date field issues
  From: erickerick...@gmail.com
  To: solr-user@lucene.apache.org
 
  H, this is puzzling. If you could come up with a couple of xml
  files and a schema
  that illustrate this, I'll see what I can see...
 
  Thanks,
  Erick
 
  On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote:
  
   Erik,
  
   I suspected the same, and setup a test instance to reproduce this. The 
   date field I used is setup to capture indexing time, in other words the 
   schema has a default value of NOW. However, I have reproduced this 
   issue with fields which do no have defaults too.
  
   On the second one, I did a delete-commit (with expungeDeletes=true) and 
   then a optimize. All other fields show updated terms except the date 
   fields. I have also double checked to see if the Luke handler has any 
   different terms, and it did not.
  
  
   Thanks
   Viswa
  
  
   Date: Wed, 4 May 2011 08:17:39 -0400
   Subject: Re: Solr Terms and Date field issues
   From: erickerick...@gmail.com
   To: solr-user@lucene.apache.org
  
   Hmmm, this *looks* like you've changed your schema without
   re-indexing all your data so you're getting old (string?) values in
   that field, but that's just a guess. If this is really happening on a
   clean index it's a problem.
  
   I'm also going to guess that you're not really deleting the documents
   you think. Are you committing after the deletes?
  
   Best
   Erick
  
   On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote:
   
Hello,
   
The terms query for a date field seems to get populated with some 
weird dates, many of these dates (1970,2009,2011-04-23) are not 
present in the indexed data.  Please see sample data below
   
I also notice that a delete and optimize does not remove the relevant 
terms for date fields, the string fields seems work fine.
   
Thanks
Viswa
   
Results from Terms component:
   
   
int name=2011-05-04T02:01:32.928Z3479/int
   
int name=2011-05-04T02:00:19.2Z3479/int
   
int name=2011-05-03T22:34:58.432Z3479/int
   
int name=2011-04-23T01:36:14.336Z3479/int
   
int name=2009-03-13T13:23:01.248Z3479/int
   
int name=1970-01-01T00:00:00Z3479/int
   
int name=1970-01-01T00:00:00Z3479/int
   
int name=1970-01-01T00:00:00Z3479/int
   
int name=1970-01-01T00:00:00Z3479/int
   
int name=2011-05-04T02:01:34.592Z265/int
   
   
Result from facet component, rounded by seconds.:
   
lst name=InsertTime
int name=2011-05-04T02:01:32Z1/int
   
int name=2011-05-04T02:01:33Z1148/int
   
int name=2011-05-04T02:01:34Z2333/int
   
str name=gap+1SECOND/str
   
date name=start2011-05-03T06:14:14Z/date
   
date name=end2011-05-04T06:14:14Z/date/lst
   
  
 
 
 
  

Re: Use Solr / Lucene to search in a Logfile

2011-05-06 Thread Robert Naczinski
Hi,

thanks for the reply. I did not know that.

Is there still a way to use Solr or Lucene? Or Apache Nutch would be not be bad.

Could I maybe write a customized DIH?

Greetings,

Robert

2011/5/6 Otis Gospodnetic otis_gospodne...@yahoo.com:
 Loggly.com


Re: Field names with a period (.)

2011-05-06 Thread Chris Hostetter

: I remember the same, except I think I've seen the recommendation that you
: make all the letters lower-case. As I remember, there are some interesting
: edge cases that you might run into later with upper case.

i can't think of *any* reason why upper case character names i na field 
would cause you problems.

In general, the low level guts of SOlr don't care what characters you use 
i na field name.

where people run into problems is that some specific features of solr 
either have limitiations in what they can deal with in a field name, or 
work in some wya that makes certain characters extremely frustrating to 
use.

the simplest example of frustration is in needing to URL escape any 
special characters when building a URL that contains a field name (ie: as 
the value of a facet.field param for example)

an example of a hard limitation is sorting: the sort param expects 
whitespace seperated lists of fieldname asc|desc pairs -- if your 
field name contains whitespace, that can screw you up.

the lucene QueryParser is another situation where punctuation and 
whitespace are significant, so having those characters in your field names 
may cause your problems (i think in most cases they can be backslash 
escaped, but i'm not certain)

As far as the specific question about . in field names -- i can't think 
of any feature that would break on that ... the only thing that comes to 
mind as a possibility is using per-field override params (ie: if the field 
name is foo.bar and you want to use 
facet.field=foo.barf.foo.bar.facet.prefix=xxx) ... but even then, i'm 
pretty sure it would work fine (you and the other people maintaining your 
code might get really confused however)


-Hoss


Re: Field names with a period (.)

2011-05-06 Thread Gora Mohanty
On Sat, May 7, 2011 at 1:29 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : I remember the same, except I think I've seen the recommendation that you
 : make all the letters lower-case. As I remember, there are some interesting
 : edge cases that you might run into later with upper case.

 i can't think of *any* reason why upper case character names i na field
 would cause you problems.
[...]

Will second that in so far as we have been using camelCase
for several months now without issues. I would like to hear
about any edge cases here.

Other than that, we have always stuck to a-z, A-Z, so I
cannot comment directly from experience about any
issues with other characters.

Regards,
Gora


custom types file for WordDelimeterFilterFactory

2011-05-06 Thread Jerry Mindek
Hi there,

 

I would like to experiment with the custom types file introduced in
solr-2059.

 

I have copied the wdftypes.txt file from SVN and put it in my
solrhome/solr/conf directory.

However, it doesn't appear to me that the WordDelimeterFilterFactory is
using it.

 

Have I put it in the correct path?

Is there an argument to the WordDelimeterFilterFactory that I must provide
in order for the file to be used?

How do I verify that it is in use? Will I see it in the analyzer?

 

Thanks,

Jerry Mindek

 



Replication Clarification Please

2011-05-06 Thread Ravi Solr
Hello,
    Pardon me if this has been already answered somewhere and I
apologize for a lengthy post. I was wondering if anybody could help me
understand Replication internals a bit more. We have a single
master-slave setup (solr 1.4.1) with the configurations as shown
below. Our environment is quite commit heavy (almost 100s of docs
every 5 minutes), and all indexing is done on Master and all searches
go to the Slave. We are seeing that the slave replication performance
gradually decreases and the speed decreases  1kbps and ultimately
gets backed up. Once we reload the core on slave it will be work fine
for sometime and then it again gets backed up. We have mergeFactor set
to 10 and ramBufferSizeMB is set to 32MB and solr itself is running
with 2GB memory and locktype is simple on both master and slave.

I am hoping that the following questions might help me understand the
replication performance issue better (Replication Configuration is
given at the end of the email)

1. Does the Slave get the whole index every time during replication or
just the delta since the last replication happened ?

2. If there are huge number of queries being done on slave will it
affect the replication ? How can I improve the performance ? (see the
replications details at he bottom of the page)

3. Will the segment names be same be same on master and slave after
replication ? I see that they are different. Is this correct ? If it
is correct how does the slave know what to fetch the next time i.e.
the delta.

4. When and why does the index.TIMESTAMP folder get created ? I see
this type of folder getting created only on slave and the slave
instance is pointing to it.

5. Does replication process copy both the index and index.TIMESTAMP folder ?

6. what happens if the replication kicks off even before the previous
invocation has not completed ? will the 2nd invocation block or will
it go through causing more confusion ?

7. If I have to prep a new master-slave combination is it OK to copy
the respective contents into the new master-slave and start solr ? or
do I have have to wipe the new slave and let it replicate from its new
master ?

8. Doing an 'ls | wc -l' on index folder of master and slave gave 194
and 17968 respectively...I slave has lot of segments_xxx files. Is
this normal ?

MASTER

requestHandler name=/replication class=solr.
ReplicationHandler 
    lst name=master
    str name=replicateAfterstartup/str
    str name=replicateAftercommit/str
    str name=replicateAfteroptimize/str

    str name=confFilesschema.xml,stopwords.txt/str
    str name=commitReserveDuration00:00:10/str
    /lst
/requestHandler


SLAVE

requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=slave
    str name=masterUrlmaster core url/str
    str name=pollInterval00:03:00/str
    str name=compressioninternal/str
    str name=httpConnTimeout5000/str
    str name=httpReadTimeout1/str
 /lst
/requestHandler


REPLICATION DETAILS FROM PAGE

Master     master core url
Poll Interval     00:03:00
Local Index     Index Version: 1296217104577, Generation: 20190
    Location: /data/solr/core/search-data/index.20110429042508
    Size: 2.1 GB
    Times Replicated Since Startup: 672
    Previous Replication Done At: Fri May 06 15:41:01 EDT 2011
    Config Files Replicated At: null
    Config Files Replicated: null
    Times Config Files Replicated Since Startup: null
    Next Replication Cycle At: Fri May 06 15:44:00 EDT 2011
    Current Replication Status     Start Time: Fri May 06 15:41:00 EDT 2011
    Files Downloaded: 43 / 197
    Downloaded: 477.08 KB / 588.82 MB [0.0%]
    Downloading File: _hdm.prx, Downloaded: 9.3 KB / 9.3 KB [100.0%]
    Time Elapsed: 967s, Estimated Time Remaining: 1221166s, Speed: 505 bytes/s


Ravi Kiran Bhaskar


RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-06 Thread Chris Hostetter

: Thanks for the response, actually what we need to achive is see group by
: results based on dates like,
: 
: 2011-01-01  23
: 2011-01-02  14
: 2011-01-03  40
: 2011-01-04  10
: 
: Now the records in my table run into millions, grouping the result based on
: UTC date would not produce the right result since the result should be
: grouped on users timezone.  Is there anyway we can achieve this in Solr?

Date faceting is entirely driven by query params, so if you index your 
events using the true time that they happend at (formatted as a string 
in UTC) you can then select your date ranges using whatever timezone 
offset is specified by your user at query time as a UTC offset.

facet.range = dateField
facet.range.start = 2011-01-01T00:00:00Z+${useroffset}MINUTES
facet.range.gap = +1DAY
etc...


-Hoss


*:* query with dismax

2011-05-06 Thread Jason Chaffee
I am using dismax and trying to use q=*:* to return all indexed
documents.  However, it is always returning 0 found.

 

If I used the default select (not dismax) handler and try q=*:* then it
returns all documents.

 

There is nothing in the logs to indicate why this happening.  

 

Does anyone have any clues?

 

Thanks,

 

Jason



Re: *:* query with dismax

2011-05-06 Thread Mark Mandel
This is exactly what should be happening, as the dismax parser doesn't
understand regular query syntax (and for good reason too). This tripped me
up as well when I first started using dismax.

Solution for me was to comfigure the handler to use *:* when the query is
empty, so that you can still get back a full result set if you need it, say
for faceting.

HTH

Mark
On May 7, 2011 9:22 AM, Jason Chaffee jchaf...@ebates.com wrote:
 I am using dismax and trying to use q=*:* to return all indexed
 documents. However, it is always returning 0 found.



 If I used the default select (not dismax) handler and try q=*:* then it
 returns all documents.



 There is nothing in the logs to indicate why this happening.



 Does anyone have any clues?



 Thanks,



 Jason



RE: *:* query with dismax

2011-05-06 Thread Jason Chaffee
Can you shed some light on what you did to configure it to handle *:*?
I have the same issue that I need it to work for faceting, but I do need
the dismax abilities as well.

-Original Message-
From: Mark Mandel [mailto:mark.man...@gmail.com] 
Sent: Friday, May 06, 2011 4:30 PM
To: solr-user@lucene.apache.org
Subject: Re: *:* query with dismax

This is exactly what should be happening, as the dismax parser doesn't
understand regular query syntax (and for good reason too). This tripped
me
up as well when I first started using dismax.

Solution for me was to comfigure the handler to use *:* when the query
is
empty, so that you can still get back a full result set if you need it,
say
for faceting.

HTH

Mark
On May 7, 2011 9:22 AM, Jason Chaffee jchaf...@ebates.com wrote:
 I am using dismax and trying to use q=*:* to return all indexed
 documents. However, it is always returning 0 found.



 If I used the default select (not dismax) handler and try q=*:* then
it
 returns all documents.



 There is nothing in the logs to indicate why this happening.



 Does anyone have any clues?



 Thanks,



 Jason



edismax available in solr 3.1?

2011-05-06 Thread cyang2010
Hi,

is edixmax available in solr 3.1?  I don't see any documentation about it.

if it is, does it support the prefix and fuzzy query?


Thanks,


cy

--
View this message in context: 
http://lucene.472066.n3.nabble.com/edismax-available-in-solr-3-1-tp2910613p2910613.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: *:* query with dismax

2011-05-06 Thread Rob Casson
it does seem a little weird, but q.alt will get what you want:

  http://wiki.apache.org/solr/DisMaxQParserPlugin#q.alt

hth,
rc

On Fri, May 6, 2011 at 7:41 PM, Jason Chaffee jchaf...@ebates.com wrote:
 Can you shed some light on what you did to configure it to handle *:*?
 I have the same issue that I need it to work for faceting, but I do need
 the dismax abilities as well.

 -Original Message-
 From: Mark Mandel [mailto:mark.man...@gmail.com]
 Sent: Friday, May 06, 2011 4:30 PM
 To: solr-user@lucene.apache.org
 Subject: Re: *:* query with dismax

 This is exactly what should be happening, as the dismax parser doesn't
 understand regular query syntax (and for good reason too). This tripped
 me
 up as well when I first started using dismax.

 Solution for me was to comfigure the handler to use *:* when the query
 is
 empty, so that you can still get back a full result set if you need it,
 say
 for faceting.

 HTH

 Mark
 On May 7, 2011 9:22 AM, Jason Chaffee jchaf...@ebates.com wrote:
 I am using dismax and trying to use q=*:* to return all indexed
 documents. However, it is always returning 0 found.



 If I used the default select (not dismax) handler and try q=*:* then
 it
 returns all documents.



 There is nothing in the logs to indicate why this happening.



 Does anyone have any clues?



 Thanks,



 Jason




Why special character is handled differently by standard/lucene query parser?

2011-05-06 Thread cyang2010
Hi, 

When user entered text contains special character, can this being taken care
by the tokenizer/filter configured at the field?

In application code, Do i need to parse the user input string and add the
escape in front of those special character?  If so, will those special
characters differ for different language, such as english versus chinese? 

As of now, I didn't parse those special character.  i am getting this
inconsistent/strange behavior/error.  For example:

1. search: title_name_en_US:(my! god)
solr thinks the second term god is something NOT to include, why is that?
lst name=debug
str name=rawquerystringtitle_name_en_US:(my! god)/str
str name=querystringtitle_name_en_US:(my! god)/str
str name=parsedquerytitle_name_en_US:my -title_name_en_US:god/str
str name=parsedquery_toStringtitle_name_en_US:my
-title_name_en_US:god/str

2. search: title_name_en_US:my!
solr return error instead, even worse:  --

INFO: [titles] webapp=/solr path=/select
params={explainOther=fl=*,scoredebugQ
uery=onindent=onstart=0q=title_name_en_US:(Oh!)hl.fl=qt=standardwt=standar
dfq=rows=10version=2.2} status=400 QTime=0
May 7, 2011 2:13:48 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.queryParser.Pars
eException: Cannot parse 'title_name_en_US:Oh!': Encountered EOF at line
1,
column 20.
Was expecting one of:
( ...
* ...
QUOTED ...
TERM ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...
TERM ...
* ...

at
org.apache.solr.handler.component.QueryComponent.prepare(QueryCompone
nt.java:108)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
rchHandler.java:181)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:131)

Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse
'title_nam
e_en_US:my!': Encountered EOF at line 1, column 20.
Was expecting one of:
( ...
* ...
QUOTED ...
TERM ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...
TERM ...
* ...

at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:205)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-special-character-is-handled-differently-by-standard-lucene-query-parser-tp2910692p2910692.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why special character is handled differently by standard/lucene query parser?

2011-05-06 Thread Yonik Seeley
On Fri, May 6, 2011 at 10:35 PM, cyang2010 ysxsu...@hotmail.com wrote:
 When user entered text contains special character, can this being taken care
 by the tokenizer/filter configured at the field?

 In application code, Do i need to parse the user input string and add the
 escape in front of those special character?  If so, will those special
 characters differ for different language, such as english versus chinese?

 As of now, I didn't parse those special character.  i am getting this
 inconsistent/strange behavior/error.  For example:

 1. search: title_name_en_US:(my! god)
 solr thinks the second term god is something NOT to include, why is that?

! is a synonym for the NOT operator in lucene query parser syntax.
The fact that it's treated as an operator even when followed by
whitespace is a bug.
This was fixed by LUCENE-2566 (which is in the trunk version, but not 3.1)

One workaround is to escape the ! or quote the term.
title_name_en_US:(my\! god)
title_name_en_US:(my! god)

In general, the lucene query parser isn't meant for directly handling
literal user queries since it has a more strict syntax (like SQL).
Something like the dismax or edismax may help (try adding
defType=dismax to your request).  They are designed to try and never
throw exceptions.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Why special character is handled differently by standard/lucene query parser?

2011-05-06 Thread cyang2010
I know about dismax.  But with that, i can't perform prefix and fuzzy query.  
can edismax handle prefix and fuzzy query?  

My application logic just pass the user entered text to solr server to
perform term query, phrase query, prefix and fuzzy query.   And i don't want
to escape the special character by parsing the java string, since i might
deal with things in different language set.   That is why I also ask whether
those special character is lanaguage specific or agnostic.

Look for your answers.


cy

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-special-character-is-handled-differently-by-standard-lucene-query-parser-tp2910692p2910809.html
Sent from the Solr - User mailing list archive at Nabble.com.