RE: Out of memory

2011-09-14 Thread Rohit
Thanks Jaeger.

Actually I am storing twitter streaming data into the core, so the rate of
index is about 12tweets(docs)/second. The same solr contains 3 other cores
but these cores are not very heavy. Now the twitter core has become very
large (77516851) and its taking a long time to query (Mostly facet queries
based on date, string fields).

After sometime about 18-20hr solr goes out of memory, the thread dump
doesn't show anything. How can I improve this besides adding more ram into
the system.



Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg

-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] 
Sent: 13 September 2011 21:06
To: solr-user@lucene.apache.org
Subject: RE: Out of memory

numDocs is not the number of documents in memory.  It is the number of
documents currently in the index (which is kept on disk).  Same goes for
maxDocs, except that it is a count of all of the documents that have ever
been in the index since it was created or optimized (including deleted
documents).

Your subject indicates that something is giving you some kind of Out of
memory error.  We might better be able to help you if you provide more
information about your exact problem.

JRJ


-Original Message-
From: Rohit [mailto:ro...@in-rev.com] 
Sent: Tuesday, September 13, 2011 2:29 PM
To: solr-user@lucene.apache.org
Subject: Out of memory

I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
core is very big containing 77516851 docs, the stats for searcher given
below

 

searcherName : Searcher@5a578998 main 
caching : true 
numDocs : 77516851 
maxDoc : 77518729 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842 
indexVersion : 1308817281798 
openedAt : Tue Sep 13 18:59:52 GMT 2011 
registeredAt : Tue Sep 13 19:00:55 GMT 2011 
warmupTime : 63139

 

. Is there a way to reduce the number of docs loaded into memory for
this core?

. At any given time I dont need data more than past 15 days, unless
someone queries for it explicetly. How can this be achieved?

. Will it be better to go for Solr replication or distribution if
there is little option left

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:  http://about.me/rohitg http://about.me/rohitg

 



EofException with Solr in Jetty

2011-09-14 Thread Michael Szalay
Hi all

sometimes we have this error in our system. We are running Solr 3.1.0 running 
on Jetty 7.2.2

Anyone an idea how to tune this?

14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | 
apache.solr.common.SolrException 151 | 154 - 
mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 
0 | org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96)
at 
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at 
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at 
org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:184)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:864)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:72)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114)
at org.eclipse.jetty.server.Server.handle(Server.java:352)
at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
at 
org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1051)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:590)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:212)
at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:426)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:508)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.access$000(SelectChannelEndPoint.java:34)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:451)
at java.lang.Thread.run(Thread.java:662)

-- 
Michael Szalay
Senior Software Engineer

basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
http://www.basis06.ch - source of smart business 



solr 1.4 highlighting issue

2011-09-14 Thread Dmitry Kan
Hello list,

Not sure how many of you are still using solr 1.4 in production, but here is
an issue with highlighting, that we've noticed:

The query is:

(drill AND ships) OR rigs


Excerpt from the highlighting list:

arr name=Contents
str
Within the fleet of 27 floating lt;emrigslt;/em (semisubmersibles and
drillships) are 21 deepwater lt;emdrillinglt;/em
/str
/arr
/lst



Why did solr highlight drilling even though there is no ships in the
text?

*
*--
Regards,

Dmitry Kan


RE: Weird behaviors with not operators.

2011-09-14 Thread electroyou
Thank you a lot for your answers! They help me to understand better how query
parser works.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-behaviors-with-not-operators-tp3323065p3335087.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 1.4 facet.limit behaviour in merging from several shards

2011-09-14 Thread Dmitry Kan
Hi Chris,

Thanks for taking this. Sorry for my confusing explanation. Since you
requested a bigger picture, I'll give some more detail. In short: we don't
do date facets, and sorting by date in reverse order happens naturally by
design.

All the data is split to shards. We use logical sharding, not hash based.
Each shard contains piece of data that corresponds to a specific date range.
We know in advance, which date range is represented by which shard. Each
document in a shard has a field, which contains date in milliseconds which
is a result of subtraction of the original document's date from a very big
date in the future. In this way, if you issue a facet query against a shard
and use facet.method=index you get hits from the shard ordered
lexicographically in reverse order.

Here is an example of two values:

9223370739060532807_docid1
9223370741484545807_docid2

The second value is larger than the first, which means that the document
itself is older.


Here is a typical facet query:

wt=xmlstart=0hl.alternateField=Contentsversion=1df=Contentsq=aerospace+engineerhl.alternateFieldLength=10facet=truef.OppositeDateLongNumber_docid.facet.limit=1000facet.field=OppositeDateLongNumber_docidrows=1facet.sort=indexfacet.zeros=falseisShard=true

The output xml is:

(skipping the header)

 lst name=facet_fields
  lst name=OppositeDateLongNumber_docid
int name=9223370722475651807_12/int
int name=9223370722825037807_41/int
int name=9223370723175759807_22/int
int name=9223370723372652807_101/int
int name=9223370723949606807_71/int
  /lst
 /lst


Excerpt from the schema:

fieldType name=text class=solr.TextField positionIncrementGap=100
omitNorms=true
  analyzer type=index
   !-- the order matters --
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
   filter class=solr.ReversedWildcardFilterFactory
withOriginal=true maxPosAsterisk=3 maxPosQuestion=2
maxFractionAsterisk=0.33/
   !-- here we have two more proprietary filters, one of which does
stemming --
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
!-- our proprietary stemming filter -
  /analyzer
/fieldType

field name=OppositeDateLongNumber_docid type=string indexed=true
stored=true  required=false omitNorms=true /
field name=Contents type=text indexed=true stored=true
omitNorms=true /


Back to the problem: It has been reproducible, that if query ran from the
solr - router reaches two or more shards, each of which generates around
1000 hits, upon merging, some portion of hits (on the time border between
two shards) gets dropped. So the result hit list is uniform otherwise,
except for the missing portion of hits in the middle.

So the question is: if the facet search reaches two or more shards and each
shard generates 1000 results, which entries will go into the final list of
resulting entries, given the facet.limit=1000 set on the original
distributed query? What is the algorithm in this case?

Please let me know, if something is not clear or more detail is needed from
schema / execution / design.

Regards,

Dmitry

On Fri, Sep 9, 2011 at 12:22 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : When shooting a distributed query, we use facet.limit=1000. Then the
 merging
 : SOLR combines the results. We also use facet.zeros=false to ensure
 returning
 : only non-zero facet entries.
 : The issue that we found is that there was a gap in time in the final
 results
 : list (reverse sorted by date attached to each entry in all the shards),
 : whereby entries stamped with certain date disappeared. If we use
 different
 : query criteria, that produces less than 1000 results both in each of the
 : shards and combined, we see those missing entries. So the problem is
 not
 : in missing data, but in the combination algorithm.

 I don't understand what you mean by entries stamped with certain date
 ... are you saying the actaul results of the search seem to be missing
 documents, or that the fact counts returned seemed to be missing
 constraints that should be in the list?

 it seems like you are refering to documents missing from the actaul
 results (reverse sorted by date) but facet.limit can't affect anything
 about the results of the actual query.  facet.limit also only applies to
 facet.field (not facet.date or facet.range), but you're talking about a
 date field

 can you please be specific about the requests you are executing (ie: what
 params) the schema you have (ie: what are the fields/types in use in all
 the params/query strings), the results you are getting, and the results
 you are expecting?   actually providing the response xml is very helpful.
 (change the fl to hide any fields you consider sensitive)

 -Hoss




-- 

Re: Out of memory

2011-09-14 Thread Dmitry Kan
Hi Rohit,

Do you use caching?
How big is your index in size on the disk?
What is the stack trace contents?

The OOM problems that we have seen so far were related to the
index physical size and usage of caching. I don't think we have ever found
the exact cause of these problems, but sharding has helped to keep each
index relatively small and OOM have gone away.

You can also attach jconsole onto your SOLR via the jmx and monitor the
memory / cpu usage in a graphical interface. I have also run garbage
collector manually through jconsole sometimes and it was of a help.

Regards,
Dmitry

On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:

 Thanks Jaeger.

 Actually I am storing twitter streaming data into the core, so the rate of
 index is about 12tweets(docs)/second. The same solr contains 3 other cores
 but these cores are not very heavy. Now the twitter core has become very
 large (77516851) and its taking a long time to query (Mostly facet queries
 based on date, string fields).

 After sometime about 18-20hr solr goes out of memory, the thread dump
 doesn't show anything. How can I improve this besides adding more ram into
 the system.



 Regards,
 Rohit
 Mobile: +91-9901768202
 About Me: http://about.me/rohitg

 -Original Message-
 From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
 Sent: 13 September 2011 21:06
 To: solr-user@lucene.apache.org
 Subject: RE: Out of memory

 numDocs is not the number of documents in memory.  It is the number of
 documents currently in the index (which is kept on disk).  Same goes for
 maxDocs, except that it is a count of all of the documents that have ever
 been in the index since it was created or optimized (including deleted
 documents).

 Your subject indicates that something is giving you some kind of Out of
 memory error.  We might better be able to help you if you provide more
 information about your exact problem.

 JRJ


 -Original Message-
 From: Rohit [mailto:ro...@in-rev.com]
 Sent: Tuesday, September 13, 2011 2:29 PM
 To: solr-user@lucene.apache.org
 Subject: Out of memory

 I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
 core is very big containing 77516851 docs, the stats for searcher given
 below



 searcherName : Searcher@5a578998 main
 caching : true
 numDocs : 77516851
 maxDoc : 77518729
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842
 indexVersion : 1308817281798
 openedAt : Tue Sep 13 18:59:52 GMT 2011
 registeredAt : Tue Sep 13 19:00:55 GMT 2011
 warmupTime : 63139



 . Is there a way to reduce the number of docs loaded into memory
 for
 this core?

 . At any given time I dont need data more than past 15 days, unless
 someone queries for it explicetly. How can this be achieved?

 . Will it be better to go for Solr replication or distribution if
 there is little option left





 Regards,

 Rohit

 Mobile: +91-9901768202

 About Me:  http://about.me/rohitg http://about.me/rohitg






Re: question about Field Collapsing/ grouping

2011-09-14 Thread Ahson Iqbal
Hi Jayendra

Thanks a lot  for your response, now i have two questions one that to get the 
count of groups is it must to apply the specified patch, if so can you help me 
a little how i can apply that patch in steps as i am new to solr/java.

Regards
Ahsan



- Original Message -
From: Jayendra Patil jayendra.patil@gmail.com
To: solr-user@lucene.apache.org; Ahson Iqbal mianah...@yahoo.com
Cc: 
Sent: Tuesday, September 13, 2011 10:55 AM
Subject: Re: question about Field Collapsing/ grouping

The time we implemented the feature, there was no straight forward solution.

What we did is to facet on the grouped by field and counting the facets.
This would give you the distinct count for the groups.

You may also want to check the Patch @
https://issues.apache.org/jira/browse/SOLR-2242, which will return the
facet counts and you need to count it by yourself.

Regards,
Jayendra

On Tue, Sep 13, 2011 at 1:27 AM, Ahson Iqbal mianah...@yahoo.com wrote:
 Hi

 Is it possible to get number of groups that matched with specified query.

 like let say there are three fields in index

 DocumentID
 Content
 Industry


 and now i want to query as +(Content:is Content:the)
 group=truegroup.field=industry

 now is it possible to get how many industries matched with specified query.

 Please help.

 Regards
 Ahsan



Re: DIH delta last_index_time

2011-09-14 Thread Gora Mohanty
On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez
maria.vazq...@dexone.com wrote:
 Hi,
 How do you handle the situation where the time on the server running Solr
 doesn¹t match the time in the database?

Firstly, why is that the case? NTP is pretty universal
these days.

 I¹m using the last_index_time saved by Solr in the delta query checking it
 against lastModifiedDate field in the database but the times are not in sync
 so I might lose some changes.
 Can we use something else other than last_index_time? Maybe something like
 last_pk or something.

One possible way is to edit dataimport.properties, manually or through
a script, to put the last_index_time back to a safe value.

Regards,
Gora


RE: Out of memory

2011-09-14 Thread Rohit
Hi Dimtry,

To answer your questions,

-Do you use caching?
I do user caching, but will disable it and give it a go.

-How big is your index in size on the disk?
These are the size of the data folder for each of the cores.
Core1 : 64GB
Core2 : 6.1GB
Core3 : 7.9GB
Core4 : 1.9GB

Will try attaching a jconsole to my solr as suggested to get a better picture.

Regards,
Rohit


-Original Message-
From: Dmitry Kan [mailto:dmitry@gmail.com] 
Sent: 14 September 2011 08:15
To: solr-user@lucene.apache.org
Subject: Re: Out of memory

Hi Rohit,

Do you use caching?
How big is your index in size on the disk?
What is the stack trace contents?

The OOM problems that we have seen so far were related to the
index physical size and usage of caching. I don't think we have ever found
the exact cause of these problems, but sharding has helped to keep each
index relatively small and OOM have gone away.

You can also attach jconsole onto your SOLR via the jmx and monitor the
memory / cpu usage in a graphical interface. I have also run garbage
collector manually through jconsole sometimes and it was of a help.

Regards,
Dmitry

On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:

 Thanks Jaeger.

 Actually I am storing twitter streaming data into the core, so the rate of
 index is about 12tweets(docs)/second. The same solr contains 3 other cores
 but these cores are not very heavy. Now the twitter core has become very
 large (77516851) and its taking a long time to query (Mostly facet queries
 based on date, string fields).

 After sometime about 18-20hr solr goes out of memory, the thread dump
 doesn't show anything. How can I improve this besides adding more ram into
 the system.



 Regards,
 Rohit
 Mobile: +91-9901768202
 About Me: http://about.me/rohitg

 -Original Message-
 From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
 Sent: 13 September 2011 21:06
 To: solr-user@lucene.apache.org
 Subject: RE: Out of memory

 numDocs is not the number of documents in memory.  It is the number of
 documents currently in the index (which is kept on disk).  Same goes for
 maxDocs, except that it is a count of all of the documents that have ever
 been in the index since it was created or optimized (including deleted
 documents).

 Your subject indicates that something is giving you some kind of Out of
 memory error.  We might better be able to help you if you provide more
 information about your exact problem.

 JRJ


 -Original Message-
 From: Rohit [mailto:ro...@in-rev.com]
 Sent: Tuesday, September 13, 2011 2:29 PM
 To: solr-user@lucene.apache.org
 Subject: Out of memory

 I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
 core is very big containing 77516851 docs, the stats for searcher given
 below



 searcherName : Searcher@5a578998 main
 caching : true
 numDocs : 77516851
 maxDoc : 77518729
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842
 indexVersion : 1308817281798
 openedAt : Tue Sep 13 18:59:52 GMT 2011
 registeredAt : Tue Sep 13 19:00:55 GMT 2011
 warmupTime : 63139



 . Is there a way to reduce the number of docs loaded into memory
 for
 this core?

 . At any given time I dont need data more than past 15 days, unless
 someone queries for it explicetly. How can this be achieved?

 . Will it be better to go for Solr replication or distribution if
 there is little option left





 Regards,

 Rohit

 Mobile: +91-9901768202

 About Me:  http://about.me/rohitg http://about.me/rohitg







Shouldn't ReversedWildcardFilterFactory resolve leadingWildcard?

2011-09-14 Thread crisfromnova
Hi,

I use the next fieldType:
fieldType name=text_general_rev class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ReversedWildcardFilterFactory
withOriginal=true/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ReversedWildcardFilterFactory
withOriginal=true/
  /analyzer
/fieldType

What I want is to find autocar when I'm searching for auto*, for example,
but no leading wildcard is returned. When I check fieldType with Analysis, 
I get this :


Index Analyzer
autocar
autocar
autocar
#1;racotua
autocar
Query Analyzer
car
car
car
car
#1;rac
car

So using for my search car* shouldn't become rac* and match racotua ?
Even if I search after rac* autocar is not found.

Using for search *car* is very expensive so I'm trying to generate the
reversed string and find it. There is a working configuration to accomplish
this?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shouldn-t-ReversedWildcardFilterFactory-resolve-leadingWildcard-tp3335240p3335240.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: question about Field Collapsing/ grouping

2011-09-14 Thread Jayendra Patil
Hi Ahson,

http://wiki.apache.org/solr/FieldCollapsing
group.ngroups seems to be added as an parameter, so you may not be
needed to apply any patches.

Solr 3.3 had released the grouping feature with it, so I presume it
should already be included in it.

Regards,
Jayendra

On Wed, Sep 14, 2011 at 4:22 AM, Ahson Iqbal mianah...@yahoo.com wrote:
 Hi Jayendra

 Thanks a lot  for your response, now i have two questions one that to get the 
 count of groups is it must to apply the specified patch, if so can you help 
 me a little how i can apply that patch in steps as i am new to solr/java.

 Regards
 Ahsan



 - Original Message -
 From: Jayendra Patil jayendra.patil@gmail.com
 To: solr-user@lucene.apache.org; Ahson Iqbal mianah...@yahoo.com
 Cc:
 Sent: Tuesday, September 13, 2011 10:55 AM
 Subject: Re: question about Field Collapsing/ grouping

 The time we implemented the feature, there was no straight forward solution.

 What we did is to facet on the grouped by field and counting the facets.
 This would give you the distinct count for the groups.

 You may also want to check the Patch @
 https://issues.apache.org/jira/browse/SOLR-2242, which will return the
 facet counts and you need to count it by yourself.

 Regards,
 Jayendra

 On Tue, Sep 13, 2011 at 1:27 AM, Ahson Iqbal mianah...@yahoo.com wrote:
 Hi

 Is it possible to get number of groups that matched with specified query.

 like let say there are three fields in index

 DocumentID
 Content
 Industry


 and now i want to query as +(Content:is Content:the)
 group=truegroup.field=industry

 now is it possible to get how many industries matched with specified query.

 Please help.

 Regards
 Ahsan




Re: How to plug a new ANTLR grammar

2011-09-14 Thread Roman Chyla
Hi Peter,

Yes, with the tree it is pretty straightforward. I'd prefer to do it
that way, but what is the purpose of the new qParser then? Is it just
that the qParser was built with a different paradigms in mind where
the parse tree was not in the equation? Anybody knows if there is any
advantage?

I looked bit more into the contrib

org.apache.lucene.queryParser.standard.StandardQueryParser.java
org.apache.lucene.queryParser.standard.QueryParserWrapper.java

And some things there (like setting default fuzzy value) are in my
case set directly in the grammar. So the query builder is still
somehow involved in parsing (IMHO not good).

But if someone knows some reasons to keep using the qParser, please
let me know.

Also, a question for Peter, at which stage do you use lucene analyzers
on the query? After it was parsed into the tree, or before we start
processing the query string?

Thanks!

  Roman





On Tue, Sep 13, 2011 at 10:14 PM, Peter Keegan peterlkee...@gmail.com wrote:
 Roman,

 I'm not familiar with the contrib, but you can write your own Java code to
 create Query objects from the tree produced by your lexer and parser
 something like this:

 StandardLuceneGrammarLexer lexer = new ANTLRReaderStream(new
 StringReader(queryString));
 CommonTokenStream tokens = new CommonTokenStream(lexer);
 StandardLuceneGrammarParser parser = new
 StandardLuceneGrammarParser(tokens);
 StandardLuceneGrammarParser.query_return ret = parser.mainQ();
 CommonTree t = (CommonTree) ret.getTree();
 parseTree(t);

 parseTree (Tree t) {

 // recursively parse the Tree, visit each node

   visit (node);

 }

 visit (Tree node) {

 switch (node.getType()) {
 case (StandardLuceneGrammarParser.AND:
 // Create BooleanQuery, push onto stack
 ...
 }
 }

 I use the stack to build up the final Query from the queries produced in the
 tree parsing.

 Hope this helps.
 Peter


 On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy jason...@gmail.com wrote:

 I'd love to see the progress on this.

 On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla roman.ch...@gmail.com
 wrote:

  Hi,
 
  The standard lucene/solr parsing is nice but not really flexible. I
  saw questions and discussion about ANTLR, but unfortunately never a
  working grammar, so... maybe you find this useful:
 
 
 https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr
 
  In the grammar, the parsing is completely abstracted from the Lucene
  objects, and the parser is not mixed with Java code. At first it
  produces structures like this:
 
 
 https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html
 
  But now I have a problem. I don't know if I should use query parsing
  framework in contrib.
 
  It seems that the qParser in contrib can use different parser
  generators (the default JavaCC, but also ANTLR). But I am confused and
  I don't understand this new queryParser from contrib. It is really
  very confusing to me. Is there any benefit in trying to plug the ANTLR
  tree into it? Because looking at the AST pictures, it seems that with
  a relatively simple tree walker we could build the same queries as the
  current standard lucene query parser. And it would be much simpler and
  flexible. Does it bring something new? I have a feeling I miss
  something...
 
  Many thanks for help,
 
   Roman
 



 --
 - sent from my mobile
 6176064373




Re: index not created

2011-09-14 Thread kumar8anuj
Hi Erick,
I have not done anything different. I downloaded the solr tar
from one of the mirror and then extracted it in the home directory started
jetty and it works fine.
   For tomcat I copied the war file in my webapps folder and restarted
tomcat changed the configuration to point it to my solr dir and started it
again. Same setup everything is same. Even this time i have tried it with
the example solr folder without multicore setup and in solrconfig.xml all
the lib paths are same which were for jetty. But still nothing is getting
indexed it shows that 1 document is there but text field doesn't show
anything in it and nothing comes when i search for something from the
document. 
Am i doing something wrong ? Please let me know. I have to implement it
ASAP. Please help me or if you can give me document to implement the same in
tomcat then i would try that way  


Thanks,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-not-created-tp3300744p3335291.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shouldn't ReversedWildcardFilterFactory resolve leadingWildcard?

2011-09-14 Thread crisfromnova
I found a partial solution.
Using ReverseStringFilterFactory instead ReverseWildcardFilterFactory and
searching after rac* will find autocar for example.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shouldn-t-ReversedWildcardFilterFactory-resolve-leadingWildcard-tp3335240p3335307.html
Sent from the Solr - User mailing list archive at Nabble.com.


Schema fieldType y-m-d ?!?!

2011-09-14 Thread stockii
is it possible to index a datefield in the format of y-m-d ? i dont need
the timestamp. so i can save me some space.


which ways exists to search with a complex date-filter !? 

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-fieldType-y-m-d-tp3335359p3335359.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Out of memory

2011-09-14 Thread Dmitry Kan
Hi,

OK 64GB fits into one shard quite nicely in our setup. But I have never used
multicore setup. In total you have 79,9 GB. We try to have 70-100GB per
shard with caching on. Do you do warming up of your index on starting? Also,
there was a setting of pre-populating the cache.

It could also help, if you can show some parts of your solrconfig file. What
is the solr version you use?

Regards,
Dmitry

On Wed, Sep 14, 2011 at 11:38 AM, Rohit ro...@in-rev.com wrote:

 Hi Dimtry,

 To answer your questions,

 -Do you use caching?
 I do user caching, but will disable it and give it a go.

 -How big is your index in size on the disk?
 These are the size of the data folder for each of the cores.
 Core1 : 64GB
 Core2 : 6.1GB
 Core3 : 7.9GB
 Core4 : 1.9GB

 Will try attaching a jconsole to my solr as suggested to get a better
 picture.

 Regards,
 Rohit


 -Original Message-
 From: Dmitry Kan [mailto:dmitry@gmail.com]
 Sent: 14 September 2011 08:15
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory

 Hi Rohit,

 Do you use caching?
 How big is your index in size on the disk?
 What is the stack trace contents?

 The OOM problems that we have seen so far were related to the
 index physical size and usage of caching. I don't think we have ever found
 the exact cause of these problems, but sharding has helped to keep each
 index relatively small and OOM have gone away.

 You can also attach jconsole onto your SOLR via the jmx and monitor the
 memory / cpu usage in a graphical interface. I have also run garbage
 collector manually through jconsole sometimes and it was of a help.

 Regards,
 Dmitry

 On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:

  Thanks Jaeger.
 
  Actually I am storing twitter streaming data into the core, so the rate
 of
  index is about 12tweets(docs)/second. The same solr contains 3 other
 cores
  but these cores are not very heavy. Now the twitter core has become very
  large (77516851) and its taking a long time to query (Mostly facet
 queries
  based on date, string fields).
 
  After sometime about 18-20hr solr goes out of memory, the thread dump
  doesn't show anything. How can I improve this besides adding more ram
 into
  the system.
 
 
 
  Regards,
  Rohit
  Mobile: +91-9901768202
  About Me: http://about.me/rohitg
 
  -Original Message-
  From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
  Sent: 13 September 2011 21:06
  To: solr-user@lucene.apache.org
  Subject: RE: Out of memory
 
  numDocs is not the number of documents in memory.  It is the number of
  documents currently in the index (which is kept on disk).  Same goes for
  maxDocs, except that it is a count of all of the documents that have ever
  been in the index since it was created or optimized (including deleted
  documents).
 
  Your subject indicates that something is giving you some kind of Out of
  memory error.  We might better be able to help you if you provide more
  information about your exact problem.
 
  JRJ
 
 
  -Original Message-
  From: Rohit [mailto:ro...@in-rev.com]
  Sent: Tuesday, September 13, 2011 2:29 PM
  To: solr-user@lucene.apache.org
  Subject: Out of memory
 
  I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
  core is very big containing 77516851 docs, the stats for searcher given
  below
 
 
 
  searcherName : Searcher@5a578998 main
  caching : true
  numDocs : 77516851
  maxDoc : 77518729
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842
  indexVersion : 1308817281798
  openedAt : Tue Sep 13 18:59:52 GMT 2011
  registeredAt : Tue Sep 13 19:00:55 GMT 2011
  warmupTime : 63139
 
 
 
  . Is there a way to reduce the number of docs loaded into memory
  for
  this core?
 
  . At any given time I dont need data more than past 15 days,
 unless
  someone queries for it explicetly. How can this be achieved?
 
  . Will it be better to go for Solr replication or distribution if
  there is little option left
 
 
 
 
 
  Regards,
 
  Rohit
 
  Mobile: +91-9901768202
 
  About Me:  http://about.me/rohitg http://about.me/rohitg
 
 
 
 




-- 
Regards,

Dmitry Kan


Re: solr 1.4 highlighting issue

2011-09-14 Thread Michael Sokolov
The highlighter gives you snippets of text surrounding words (terms) 
drawn from the query.  The whole document should satisfy the query (ie 
it probably has ships/s somewhere else in it), but each snippet won't 
generally have all the terms.


-Mike

On 9/14/2011 2:54 AM, Dmitry Kan wrote:

Hello list,

Not sure how many of you are still using solr 1.4 in production, but here is
an issue with highlighting, that we've noticed:

The query is:

(drill AND ships) OR rigs


Excerpt from the highlighting list:

arr name=Contents
str
Within the fleet of 27 floatinglt;emrigslt;/em  (semisubmersibles and
drillships) are 21 deepwaterlt;emdrillinglt;/em
/str
/arr
/lst



Why did solr highlight drilling even though there is no ships in the
text?

*
*--
Regards,

Dmitry Kan





Re: solr 1.4 highlighting issue

2011-09-14 Thread Koji Sekiguchi

(11/09/14 15:54), Dmitry Kan wrote:

Hello list,

Not sure how many of you are still using solr 1.4 in production, but here is
an issue with highlighting, that we've noticed:

The query is:

(drill AND ships) OR rigs


Excerpt from the highlighting list:

arr name=Contents
str
Within the fleet of 27 floatinglt;emrigslt;/em  (semisubmersibles and
drillships) are 21 deepwaterlt;emdrillinglt;/em
/str
/arr
/lst



Why did solr highlight drilling even though there is no ships in the
text?



Dmitry,

This is expected, even if you use the latest version of Solr.

You got the document because rigs was hit in the document, but then Highlighter
tries to search individual terms of the query in the document again.

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


RE: NRT and commit behavior

2011-09-14 Thread Tirthankar Chatterjee
Erick,
Here is the answer to your questions:
Our index is 267 GB 
We are not optimizing...
No we have not profiled yet to check the bottleneck, but logs indicate opening 
the searchers is taking time...
Nothing except SOLR
Total memory is 16GB tomcat has 8GB allocated 
Everything 64 bit OS and JVM and Tomcat

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, September 11, 2011 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: NRT and commit behavior

Hmm, OK. You might want to look at the non-cached filter query stuff, it's 
quite recent.
The point here is that it is a filter that is applied only after all of the 
less expensive filter queries are run, One of its uses is exactly ACL 
calculations. Rather than calculate the ACL for the entire doc set, it only 
calculates access for docs that have made it past all the other elements of the 
query See SOLR-2429 and note that it is a 3.4 (currently being released) 
only.

As to why your commits are taking so long, I have no idea given that you really 
haven't given us much to work with.

How big is your index? Are you optimizing? Have you profiled the application to 
see what the bottleneck is (I/O, CPU, etc?). What else is running on your 
machine? It's quite surprising that it takes that long. How much memory are you 
giving the JVM? etc...

You might want to review: http://wiki.apache.org/solr/UsingMailingLists

Best
Erick


On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee 
tchatter...@commvault.com wrote:
 Erick,
 What you said is correct for us the searches are based on some Active 
 Directory permissions which are populated in Filter query parameter. So we 
 don't have any warming query concept as we cannot fire for every user ahead 
 of time.

 What we do here is that when user logs in we do an invalid query(which return 
 no results instead of '*') with the correct filter query (which is his 
 permissions based on the login). This way the cache gets warmed up with valid 
 docs.

 It works then.


 Also, can you please let me know why commit is taking 45 mins to 1 hours on a 
 good resourced hardware with multiple processors and 16gb RAM 64 bit VM, etc. 
 We tried passing waitSearcher as false and found that inside the code it hard 
 coded to be true. Is there any specific reason. Can we change that value to 
 honor what is being passed.

 Thanks,
 Tirthankar

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Thursday, September 01, 2011 8:38 AM
 To: solr-user@lucene.apache.org
 Subject: Re: NRT and commit behavior

 Hmm, I'm guessing a bit here, but using an invalid query doesn't sound very 
 safe, but I suppose it *might* be OK.

 What does invalid mean? Syntax error? not safe.

 search that returns 0 results? I don't know, but I'd guess that 
 filling your caches, which is the point of warming queries, might be 
 short circuited if the query returns
 0 results but I don't know for sure.

 But the fact that invalid queries return quicker does not inspire 
 confidence since the *point* of warming queries is to spend the time up front 
 so your users don't have to wait.

 So here's a test. Comment out your warming queries.
 Restart your server and fire the warming query from the browser 
 withdebugQuery=on and look at the QTime parameter.

 Now fire the same form of the query (as in the same sort, facet, grouping, 
 etc, but presumably a valid term). See the QTime.

 Now fire the same form of the query with a *different* value in the query. 
 That is, it should search on different terms but with the same sort, facet, 
 etc. to avoid getting your data straight from the queryResultCache.

 My guess is that the last query will return much more quickly than the second 
 query. Which would indicate that the first form isn't doing you any good.

 But a test is worth a thousand opinions.

 Best
 Erick

 On Wed, Aug 31, 2011 at 11:04 AM, Tirthankar Chatterjee 
 tchatter...@commvault.com wrote:
 Also noticed that waitSearcher parameter value is not  honored inside 
 commit. It is always defaulted to true which makes it slow during indexing.

 What we are trying to do is use an invalid query (which wont return any 
 results) as a warming query. This way the commit returns faster. Are we 
 doing something wrong here?

 Thanks,
 Tirthankar

 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Monday, July 18, 2011 11:38 AM
 To: solr-user@lucene.apache.org; yo...@lucidimagination.com
 Subject: Re: NRT and commit behavior

 In practice, in my experience at least, a very 'expensive' commit can 
 still slow down searches significantly, I think just due to CPU (or
 i/o?) starvation. Not sure anything can be done about that.  That's my 
 experience in Solr 1.4.1, but since searches have always been async with 
 commits, it probably is the same situation even in more recent versions, I'd 
 guess.

 On 7/18/2011 11:07 AM, Yonik Seeley wrote:
 

RE: NRT and commit behavior

2011-09-14 Thread Tirthankar Chatterjee
Erick,
Also, we had  our solrconfig where we have tried increasing the cache 
making the below value for autowarm count as 0 helps returning the commit call 
within the second, but that will slow us down on searches

filterCache
  class=solr.FastLRUCache
  size=16384
  initialSize=4096
  autowarmCount=4096/

!-- Cache used to hold field values that are quickly accessible
 by document id.  The fieldValueCache is created by default
 even if not configured here.
  fieldValueCache
class=solr.FastLRUCache
size=512
autowarmCount=128
showItems=32
  /
--

   !-- queryResultCache caches results of searches - ordered lists of
 document ids (DocList) based on a query, a sort, and the range
 of documents requested.  --
queryResultCache
  class=solr.LRUCache
  size=16384
  initialSize=4096
  autowarmCount=4096/

  !-- documentCache caches Lucene Document objects (the stored fields for each 
document).
   Since Lucene internal document ids are transient, this cache will not be 
autowarmed.  --
documentCache
  class=solr.LRUCache
  size=512
  initialSize=512
  autowarmCount=512/

-Original Message-
From: Tirthankar Chatterjee [mailto:tchatter...@commvault.com] 
Sent: Wednesday, September 14, 2011 7:31 AM
To: solr-user@lucene.apache.org
Subject: RE: NRT and commit behavior

Erick,
Here is the answer to your questions:
Our index is 267 GB
We are not optimizing...
No we have not profiled yet to check the bottleneck, but logs indicate opening 
the searchers is taking time...
Nothing except SOLR
Total memory is 16GB tomcat has 8GB allocated Everything 64 bit OS and JVM and 
Tomcat

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Sunday, September 11, 2011 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: NRT and commit behavior

Hmm, OK. You might want to look at the non-cached filter query stuff, it's 
quite recent.
The point here is that it is a filter that is applied only after all of the 
less expensive filter queries are run, One of its uses is exactly ACL 
calculations. Rather than calculate the ACL for the entire doc set, it only 
calculates access for docs that have made it past all the other elements of the 
query See SOLR-2429 and note that it is a 3.4 (currently being released) 
only.

As to why your commits are taking so long, I have no idea given that you really 
haven't given us much to work with.

How big is your index? Are you optimizing? Have you profiled the application to 
see what the bottleneck is (I/O, CPU, etc?). What else is running on your 
machine? It's quite surprising that it takes that long. How much memory are you 
giving the JVM? etc...

You might want to review: http://wiki.apache.org/solr/UsingMailingLists

Best
Erick


On Fri, Sep 9, 2011 at 9:41 AM, Tirthankar Chatterjee 
tchatter...@commvault.com wrote:
 Erick,
 What you said is correct for us the searches are based on some Active 
 Directory permissions which are populated in Filter query parameter. So we 
 don't have any warming query concept as we cannot fire for every user ahead 
 of time.

 What we do here is that when user logs in we do an invalid query(which return 
 no results instead of '*') with the correct filter query (which is his 
 permissions based on the login). This way the cache gets warmed up with valid 
 docs.

 It works then.


 Also, can you please let me know why commit is taking 45 mins to 1 hours on a 
 good resourced hardware with multiple processors and 16gb RAM 64 bit VM, etc. 
 We tried passing waitSearcher as false and found that inside the code it hard 
 coded to be true. Is there any specific reason. Can we change that value to 
 honor what is being passed.

 Thanks,
 Tirthankar

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Thursday, September 01, 2011 8:38 AM
 To: solr-user@lucene.apache.org
 Subject: Re: NRT and commit behavior

 Hmm, I'm guessing a bit here, but using an invalid query doesn't sound very 
 safe, but I suppose it *might* be OK.

 What does invalid mean? Syntax error? not safe.

 search that returns 0 results? I don't know, but I'd guess that 
 filling your caches, which is the point of warming queries, might be 
 short circuited if the query returns
 0 results but I don't know for sure.

 But the fact that invalid queries return quicker does not inspire 
 confidence since the *point* of warming queries is to spend the time up front 
 so your users don't have to wait.

 So here's a test. Comment out your warming queries.
 Restart your server and fire the warming query from the browser 
 withdebugQuery=on and look at the QTime parameter.

 Now fire the same form of the query (as in the same sort, facet, grouping, 
 etc, but presumably a valid term). See the QTime.

 Now fire the same form of the query with a *different* value in the query. 
 That 

Re: How to return a function result instead of doclist in the Solr collapsing/grouping feature?

2011-09-14 Thread Erick Erickson
Well, what is the average of latitude and longitude? If you're asking
for the average of all the docs that match, or the average of all the
docs in the corpus, no, I don't think you can unless you write a custom
plugin.

Something like this has been talked about, see:
 https://issues.apache.org/jira/browse/SOLR-1622
but I don't think any such thing has been implemented.

Best
Erick

On Mon, Sep 12, 2011 at 5:37 PM, Pablo Ricco pri...@gmail.com wrote:
 I have the following solr fields in schema.xml:

   - id (string)
   - name (string)
   - category(string)
   - latitude (double)
   - longitude(double)

 Is it possible to make a query that groups by category and returns the
 average of latitude and longitude instead of the doclist?

 Thanks,
 Pablo



RE: Out of memory

2011-09-14 Thread Rohit
Thanks Dmirty for the offer to help, I am using some caching in one of the 
cores not. Earlier I was using on other cores too, but now I have commented 
them out because of frequent OOM, also some warming up in one of the core. I 
have share the links for my config files for all the 4 cores,

http://haklus.com/crssConfig.xml
http://haklus.com/rssConfig.xml
http://haklus.com/twitterConfig.xml
http://haklus.com/facebookConfig.xml


Thanks again
Rohit


-Original Message-
From: Dmitry Kan [mailto:dmitry@gmail.com] 
Sent: 14 September 2011 10:23
To: solr-user@lucene.apache.org
Subject: Re: Out of memory

Hi,

OK 64GB fits into one shard quite nicely in our setup. But I have never used
multicore setup. In total you have 79,9 GB. We try to have 70-100GB per
shard with caching on. Do you do warming up of your index on starting? Also,
there was a setting of pre-populating the cache.

It could also help, if you can show some parts of your solrconfig file. What
is the solr version you use?

Regards,
Dmitry

On Wed, Sep 14, 2011 at 11:38 AM, Rohit ro...@in-rev.com wrote:

 Hi Dimtry,

 To answer your questions,

 -Do you use caching?
 I do user caching, but will disable it and give it a go.

 -How big is your index in size on the disk?
 These are the size of the data folder for each of the cores.
 Core1 : 64GB
 Core2 : 6.1GB
 Core3 : 7.9GB
 Core4 : 1.9GB

 Will try attaching a jconsole to my solr as suggested to get a better
 picture.

 Regards,
 Rohit


 -Original Message-
 From: Dmitry Kan [mailto:dmitry@gmail.com]
 Sent: 14 September 2011 08:15
 To: solr-user@lucene.apache.org
 Subject: Re: Out of memory

 Hi Rohit,

 Do you use caching?
 How big is your index in size on the disk?
 What is the stack trace contents?

 The OOM problems that we have seen so far were related to the
 index physical size and usage of caching. I don't think we have ever found
 the exact cause of these problems, but sharding has helped to keep each
 index relatively small and OOM have gone away.

 You can also attach jconsole onto your SOLR via the jmx and monitor the
 memory / cpu usage in a graphical interface. I have also run garbage
 collector manually through jconsole sometimes and it was of a help.

 Regards,
 Dmitry

 On Wed, Sep 14, 2011 at 9:10 AM, Rohit ro...@in-rev.com wrote:

  Thanks Jaeger.
 
  Actually I am storing twitter streaming data into the core, so the rate
 of
  index is about 12tweets(docs)/second. The same solr contains 3 other
 cores
  but these cores are not very heavy. Now the twitter core has become very
  large (77516851) and its taking a long time to query (Mostly facet
 queries
  based on date, string fields).
 
  After sometime about 18-20hr solr goes out of memory, the thread dump
  doesn't show anything. How can I improve this besides adding more ram
 into
  the system.
 
 
 
  Regards,
  Rohit
  Mobile: +91-9901768202
  About Me: http://about.me/rohitg
 
  -Original Message-
  From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
  Sent: 13 September 2011 21:06
  To: solr-user@lucene.apache.org
  Subject: RE: Out of memory
 
  numDocs is not the number of documents in memory.  It is the number of
  documents currently in the index (which is kept on disk).  Same goes for
  maxDocs, except that it is a count of all of the documents that have ever
  been in the index since it was created or optimized (including deleted
  documents).
 
  Your subject indicates that something is giving you some kind of Out of
  memory error.  We might better be able to help you if you provide more
  information about your exact problem.
 
  JRJ
 
 
  -Original Message-
  From: Rohit [mailto:ro...@in-rev.com]
  Sent: Tuesday, September 13, 2011 2:29 PM
  To: solr-user@lucene.apache.org
  Subject: Out of memory
 
  I have solr running on a machine with 18Gb Ram , with 4 cores. One of the
  core is very big containing 77516851 docs, the stats for searcher given
  below
 
 
 
  searcherName : Searcher@5a578998 main
  caching : true
  numDocs : 77516851
  maxDoc : 77518729
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@5a9c5842
  indexVersion : 1308817281798
  openedAt : Tue Sep 13 18:59:52 GMT 2011
  registeredAt : Tue Sep 13 19:00:55 GMT 2011
  warmupTime : 63139
 
 
 
  . Is there a way to reduce the number of docs loaded into memory
  for
  this core?
 
  . At any given time I dont need data more than past 15 days,
 unless
  someone queries for it explicetly. How can this be achieved?
 
  . Will it be better to go for Solr replication or distribution if
  there is little option left
 
 
 
 
 
  Regards,
 
  Rohit
 
  Mobile: +91-9901768202
 
  About Me:  http://about.me/rohitg http://about.me/rohitg
 
 
 
 




-- 
Regards,

Dmitry Kan



Re: indexing data from rich documents - Tika with solr3.1

2011-09-14 Thread Erick Erickson
FileListEntityProcessor pre-supposes it's looking at files on disk. it
doesn't know anything about the web. So, as the stack trace
indicates, it tries to open a directory called http://. and fails.

What is it you're really trying to do here? Perhaps if you explain
your higher-level problem we can provide some help.

Best
Erick

On Mon, Sep 12, 2011 at 11:53 PM, scorpking lehoank1...@gmail.com wrote:
 Hi,
 Can you explain me this problem?
 I have indexed data from multi file which use tika libs. And i have indexed
 data from http. But only one file (ex: http://myweb/filename.pdf). Now i
 have many file formats in a http path (ex:http://myweb/files/). I tried
 index data from a http path but it's not work. It is my data-config.

 *dataConfig
    dataSource type=BinURLDataSource name=bin encoding=utf-8/
    document
                entity name=sd processor=FileListEntityProcessor
 fileName=.*\.(DOC)|(PDF)|(pdf)|(doc)
 baseDir=http://www.lc.unsw.edu.au/onlib/pdf/;
                                recursive=true rootEntity=false 
 transformer=DateFormatTransformer


        entity name=tika-test processor=TikaEntityProcessor
 url=${sd.fileAbsolutePath} format=text dataSource=bin 

                field column=Author name=author meta=true/
                field column=title name=title meta=true/
                field column=text name=text/

        /entity
                                 field column=file name=filename/

                /entity
    /document
 /dataConfig*

 Error:
 Full Import
 failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
 'baseDir' value: http://www.lc.unsw.edu.au/onlib/pdf/ is not a directory
 Processing Document # 1
        at
 org.apache.solr.handler.dataimport.FileListEntityProcessor.init(FileListEntityProcessor.java:124)
        at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:69)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:552)
        at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:267)
        at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:186)
        at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:353)
        at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:411)
        at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392)

 Thanks for your help.


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3331651.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to plug a new ANTLR grammar

2011-09-14 Thread Peter Keegan
Also, a question for Peter, at which stage do you use lucene analyzers
on the query? After it was parsed into the tree, or before we start
processing the query string?

I do the analysis before creating the tree. I'm pretty sure Lucene
QueryParser does this, too.

Peter

On Wed, Sep 14, 2011 at 5:15 AM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi Peter,

 Yes, with the tree it is pretty straightforward. I'd prefer to do it
 that way, but what is the purpose of the new qParser then? Is it just
 that the qParser was built with a different paradigms in mind where
 the parse tree was not in the equation? Anybody knows if there is any
 advantage?

 I looked bit more into the contrib

 org.apache.lucene.queryParser.standard.StandardQueryParser.java
 org.apache.lucene.queryParser.standard.QueryParserWrapper.java

 And some things there (like setting default fuzzy value) are in my
 case set directly in the grammar. So the query builder is still
 somehow involved in parsing (IMHO not good).

 But if someone knows some reasons to keep using the qParser, please
 let me know.

 Also, a question for Peter, at which stage do you use lucene analyzers
 on the query? After it was parsed into the tree, or before we start
 processing the query string?

 Thanks!

  Roman





 On Tue, Sep 13, 2011 at 10:14 PM, Peter Keegan peterlkee...@gmail.com
 wrote:
  Roman,
 
  I'm not familiar with the contrib, but you can write your own Java code
 to
  create Query objects from the tree produced by your lexer and parser
  something like this:
 
  StandardLuceneGrammarLexer lexer = new ANTLRReaderStream(new
  StringReader(queryString));
  CommonTokenStream tokens = new CommonTokenStream(lexer);
  StandardLuceneGrammarParser parser = new
  StandardLuceneGrammarParser(tokens);
  StandardLuceneGrammarParser.query_return ret = parser.mainQ();
  CommonTree t = (CommonTree) ret.getTree();
  parseTree(t);
 
  parseTree (Tree t) {
 
  // recursively parse the Tree, visit each node
 
visit (node);
 
  }
 
  visit (Tree node) {
 
  switch (node.getType()) {
  case (StandardLuceneGrammarParser.AND:
  // Create BooleanQuery, push onto stack
  ...
  }
  }
 
  I use the stack to build up the final Query from the queries produced in
 the
  tree parsing.
 
  Hope this helps.
  Peter
 
 
  On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy jason...@gmail.com wrote:
 
  I'd love to see the progress on this.
 
  On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla roman.ch...@gmail.com
  wrote:
 
   Hi,
  
   The standard lucene/solr parsing is nice but not really flexible. I
   saw questions and discussion about ANTLR, but unfortunately never a
   working grammar, so... maybe you find this useful:
  
  
 
 https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr
  
   In the grammar, the parsing is completely abstracted from the Lucene
   objects, and the parser is not mixed with Java code. At first it
   produces structures like this:
  
  
 
 https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html
  
   But now I have a problem. I don't know if I should use query parsing
   framework in contrib.
  
   It seems that the qParser in contrib can use different parser
   generators (the default JavaCC, but also ANTLR). But I am confused and
   I don't understand this new queryParser from contrib. It is really
   very confusing to me. Is there any benefit in trying to plug the ANTLR
   tree into it? Because looking at the AST pictures, it seems that with
   a relatively simple tree walker we could build the same queries as the
   current standard lucene query parser. And it would be much simpler and
   flexible. Does it bring something new? I have a feeling I miss
   something...
  
   Many thanks for help,
  
Roman
  
 
 
 
  --
  - sent from my mobile
  6176064373
 
 



Re: where is the SOLR_HOME ?

2011-09-14 Thread Juan Grande
Hi Ahmad,

While Solr is starting it writes the path to SOLR_HOME to the log. The
message looks something like:

Sep 14, 2011 9:14:53 AM org.apache.solr.core.SolrResourceLoader init

INFO: Solr home set to 'solr/'


If you're running the example, SOLR_HOME is usually
apache-solr-3.3.0/example/solr

Solr also writes a line like the following in the log for every JAR file it
loads:

Sep 14, 2011 9:14:53 AM org.apache.solr.core.SolrResourceLoader
 replaceClassLoader

INFO: Adding
 'file:/home/jgrande/apache-solr-3.3.0/contrib/extraction/lib/pdfbox-1.3.1.jar'
 to classloader


With this information you should be able to determine which JAR files Solr
is loading and I'm pretty sure that it's loading all the files you need. The
problem may be that you must also include
apache-solr-analysis-extras-3.3.0.jar from the apache-solr-3.3.0/dist
directory.

Regards,

*Juan*



On Wed, Sep 14, 2011 at 12:19 AM, ahmad ajiloo ahmad.aji...@gmail.comwrote:

 Hi
 In this page (
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
 

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
 )
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
 said:
 Note: to use this filter, see solr/contrib/analysis-extras/README.txt for
 instructions on which jars you need to add to your SOLR_HOME/lib 
  I can't find SOLR_HOME/lib !
 1- Is there: apache-solr-3.3.0\example\solr ? there is no directory which
 name is lib
 I created example/solr/lib directory and copied jar files to it and
 tested
 this expressions in solrconfig.xml :
 lib dir=../../example/solr/lib /
 lib dir=./lib /
 lib dir=../../../example/solr/lib / (for more assurance!!!)
 but it doesn't work and still has following errors !

 2- or: apache-solr-3.3.0\ ? there is no directory which name is lib
 3- or : apache-solr-3.3.0\example ? there is a lib directory. I copied
 4
 libraries exist in solr/contrib/analysis-extras/
  to apache-solr-3.3.0\example\lib but some errors exist in loading page
 
 http://localhost:8983/solr/admin; :

 I use Nutch to crawling the web and fetching web pages. I send data of
 Nutch
 to Solr for Indexing. according to Nutch tutorial (
 http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch)
 I
 should copy schema.xml of Nutch to conf directory of Solr.
 So I added all of my required Analyzer like *ICUNormalizer2FilterFactory
 *to
 this new shema.xml


 this is schema.xml :

 -I
 added bold text to this file
 ?xml version=1.0 encoding=UTF-8 ?
 schema name=nutch version=1.3
types
fieldType name=string class=solr.StrField
 sortMissingLast=true
omitNorms=true/
fieldType name=long class=solr.TrieLongField precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=float class=solr.TrieFloatField
 precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=date class=solr.TrieDateField precisionStep=0
omitNorms=true positionIncrementGap=0/

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

*fieldType name=text_icu class=solr.TextField
 autoGeneratePhraseQueries=false
 analyzer
tokenizer class=solr.ICUTokenizerFactory/
  /analyzer
/fieldType
fieldType name=icu_sort_en class=solr.TextField
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.ICUCollationKeyFilterFactory
 locale=en strength=primary/
/analyzer
/fieldType
fieldType name=normalized class=solr.TextField
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUNormalizer2FilterFactory name=nfkc_cf
 mode=compose/
  /analyzer
/fieldType
fieldType name=folded class=solr.TextField
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUFoldingFilterFactory/
/analyzer
/fieldType
fieldType name=transformed class=solr.TextField
analyzer

RE: EofException with Solr in Jetty

2011-09-14 Thread Jaeger, Jay - DOT
Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks 
like this:

if (_closed)
throw new IOException(Closed);  

[http://www.jarvana.com/jarvana/view/org/eclipse/jetty/aggregate/jetty-all/7.1.0.RC0/jetty-all-7.1.0.RC0-sources.jar!/org/eclipse/jetty/server/HttpOutput.java?format=ok
 -- which may or may not match exactly, but I doubt that this code changes all 
that often.]

I would read this as Jetty thinking that this HTTP connection is closed.

It this perhaps a case of your HTTP client disconnecting (or crashing) before 
Jetty can get the entire message (HTTP response) sent?

(The other alternative that occurs to me would be that Solr told Jetty the 
response was all done, but then turned around and tried to send more in the 
response).

-Original Message-
From: Michael Szalay [mailto:michael.sza...@basis06.ch] 
Sent: Wednesday, September 14, 2011 1:47 AM
To: solr-user@lucene.apache.org; JETTY user mailing list
Subject: EofException with Solr in Jetty

Hi all

sometimes we have this error in our system. We are running Solr 3.1.0 running 
on Jetty 7.2.2

Anyone an idea how to tune this?

14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | 
apache.solr.common.SolrException 151 | 154 - 
mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 
0 | org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96)
at 
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at 
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at 
org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:184)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:864)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:72)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114)
at org.eclipse.jetty.server.Server.handle(Server.java:352)
at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
at 
org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1051)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:590)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:212)
at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:426)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:508)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.access$000(SelectChannelEndPoint.java:34)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:451)
at java.lang.Thread.run(Thread.java:662)

-- 
Michael Szalay
Senior Software Engineer

basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
http://www.basis06.ch - source of smart business 



BigDecimal data type

2011-09-14 Thread Kissue Kissue
Hi,

Is there a way to use BigDecimal as a data type in solr? I am using solr
3.3.

Thanks.


Re: EofException with Solr in Jetty

2011-09-14 Thread Michael Szalay
We are using SolrJ 3.1 as our http client...
So it may be a bug in there?

Regards Michael

--
Michael Szalay
Senior Software Engineer

basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
http://www.basis06.ch - source of smart business

- Ursprüngliche Mail -
Von: Jay Jaeger - DOT jay.jae...@dot.wi.gov
An: solr-user@lucene.apache.org, JETTY user mailing list 
jetty-us...@eclipse.org
Gesendet: Mittwoch, 14. September 2011 15:21:19
Betreff: RE: EofException with Solr in Jetty

Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks 
like this:

if (_closed)
throw new IOException(Closed);  

[http://www.jarvana.com/jarvana/view/org/eclipse/jetty/aggregate/jetty-all/7.1.0.RC0/jetty-all-7.1.0.RC0-sources.jar!/org/eclipse/jetty/server/HttpOutput.java?format=ok
 -- which may or may not match exactly, but I doubt that this code changes all 
that often.]

I would read this as Jetty thinking that this HTTP connection is closed.

It this perhaps a case of your HTTP client disconnecting (or crashing) before 
Jetty can get the entire message (HTTP response) sent?

(The other alternative that occurs to me would be that Solr told Jetty the 
response was all done, but then turned around and tried to send more in the 
response).

-Original Message-
From: Michael Szalay [mailto:michael.sza...@basis06.ch]
Sent: Wednesday, September 14, 2011 1:47 AM
To: solr-user@lucene.apache.org; JETTY user mailing list
Subject: EofException with Solr in Jetty

Hi all

sometimes we have this error in our system. We are running Solr 3.1.0 running 
on Jetty 7.2.2

Anyone an idea how to tune this?

14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | 
apache.solr.common.SolrException 151 | 154 - 
mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 
0 | org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96)
at 
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at 
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at 
org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:184)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:864)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:72)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114)
at org.eclipse.jetty.server.Server.handle(Server.java:352)
at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
at 
org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1051)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:590)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:212)
at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:426)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:508)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.access$000(SelectChannelEndPoint.java:34)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:451)
at java.lang.Thread.run(Thread.java:662)

--
Michael Szalay
Senior Software Engineer

basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
http://www.basis06.ch - source of smart business



Re: Managing solr machines (start/stop/status)

2011-09-14 Thread Shawn Heisey

On 9/13/2011 6:05 PM, Jamie Johnson wrote:

I know this isn't a solr specific question but I was wondering what
folks do in regards to managing the machines in their solr cluster?
Are there any recommendations for how to start/stop/manage these
machines?  Any suggestions would be appreciated.


What do you mean by manage?

For stopping and starting, I built my own redhat-friendly init script to 
handle jetty.  It uses a file in /etc/sysconfig for commandline options.


You can see my init script here:

http://pastebin.com/GweJVGk5

Here's what I have in /etc/sysconfig/solr:
STARTARGS=-Xms3072M -Xmx3072M -XX:NewSize=2048M -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-Dsolr.solr.home=/index/solr -Dsolr.clustering.enabled=true 
-DSTOP.PORT=8079 -DSTOP.KEY=somePassword

STOPARGS=-DSTOP.PORT=8079 -DSTOP.KEY=somePassword

I'm running CentOS5, but I ran into a problem with the fuser command 
that I use in the init script.  I filed a bug with CentOS, but since the 
bug comes from upstream, they were not able to fix it.  You may need to 
install a new psmisc package to use the init script:


http://bugs.centos.org/view.php?id=4260

The script works fine on CentOS 6.

Thanks,
Shawn



math with date and modulo

2011-09-14 Thread stockii
Hello.

i am fighting with the FunctionQuery of Solr. 

I try to get a diff of today and an dateField. from this diff, i want do a
modulo from another field with values of 1,3,6,12

in a function somthing like this. ( i know that some functions are not
available in solr)

q={!func}$v2=0v1=(NOW - $var)v2=modulo($v1,interval)

OR 

(DIFF(Month of Today - Month of Search) MOD interval) = 0

can anybody give me some tipps ? 

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 
1 Core with 45 Million Documents other Cores  200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/math-with-date-and-modulo-tp3335800p3335800.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: index not created

2011-09-14 Thread Jaeger, Jay - DOT
 changed the configuration to point it to my solr dir and started it again

You might look in your logs to see where Solr thinks the Solr home directory is 
and/or if it complains about not being able to find it.  As a guess, it can't 
find it, perhaps because solr.solr.home does not point to the right place.  As 
a result, the servlet can't actually find the Solr code, and isn't really 
indexing anything at all.

For my tomcat install test, I put the following in startup.bat  (Windows, but 
the Linux startup script startup.sh would be similar).

set JAVA_OPTS=-Dsolr.solr.home=C:/pro/apache-solr-3.3.0/example/solr

(my JAVA_OPTS has a bunch of other stuff for security, waffle, etc., but this 
is the one that would matter in your case).

JRJ

-Original Message-
From: kumar8anuj [mailto:kumar.an...@gmail.com] 
Sent: Wednesday, September 14, 2011 4:21 AM
To: solr-user@lucene.apache.org
Subject: Re: index not created

Hi Erick,
I have not done anything different. I downloaded the solr tar
from one of the mirror and then extracted it in the home directory started
jetty and it works fine.
   For tomcat I copied the war file in my webapps folder and restarted
tomcat changed the configuration to point it to my solr dir and started it
again. Same setup everything is same. Even this time i have tried it with
the example solr folder without multicore setup and in solrconfig.xml all
the lib paths are same which were for jetty. But still nothing is getting
indexed it shows that 1 document is there but text field doesn't show
anything in it and nothing comes when i search for something from the
document. 
Am i doing something wrong ? Please let me know. I have to implement it
ASAP. Please help me or if you can give me document to implement the same in
tomcat then i would try that way  


Thanks,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/index-not-created-tp3300744p3335291.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Schema fieldType y-m-d ?!?!

2011-09-14 Thread Jaeger, Jay - DOT
Just add a bogus 0 timestamp after it when you index it.  That is what we did.  
Dates are not stored or indexed as characters, anyway, so space would not be 
any different one way or the other.

JRJ

-Original Message-
From: stockii [mailto:stock.jo...@googlemail.com] 
Sent: Wednesday, September 14, 2011 4:56 AM
To: solr-user@lucene.apache.org
Subject: Schema fieldType y-m-d ?!?!

is it possible to index a datefield in the format of y-m-d ? i dont need
the timestamp. so i can save me some space.


which ways exists to search with a complex date-filter !? 

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores  100.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-fieldType-y-m-d-tp3335359p3335359.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Shouldn't ReversedWildcardFilterFactory resolve leadingWildcard?

2011-09-14 Thread Tomás Fernández Löbbe
auto* is not a leading wildcard query, a leading wildcard query would be
*car. Wildcard queries in general will take more time than regular
queries, the more close the wildcard is to the first character, the more
expensive the query is.
With a regular field type, Solr will allow wildcards (not with dismax) like
auto*, but not leading wildcard queries like *car.

The ReverseWilcardFilter is there for allowing leading wildcards on
searches. It only needs to be added at index time (not at query time). When
using this field type, all the terms at index time will be reversed like you
showed on your example, adding an *impossible character* at the beginning of
the term to prevent it to match regular terms.

'autocar' will be indexed as 'autocar' and '#1;racotua' (see '#1;', that's
the impossible character).

When you search for 'auto*', Solr will resolve the query as always but if
you search for '*car', the query parser (not any analysis filter, that's why
you don't need to add the filter on query-time) will invert that term and
add the 'impossible character' at the beginning, like '#1;rac*'. That's why
'#1;racotuar' should match the query.

From your configuration, if you remove the filter at query-time it should
work.

Regards,

Tomás

On Wed, Sep 14, 2011 at 6:26 AM, crisfromnova crisfromn...@gmail.comwrote:

 I found a partial solution.
 Using ReverseStringFilterFactory instead ReverseWildcardFilterFactory and
 searching after rac* will find autocar for example.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Shouldn-t-ReversedWildcardFilterFactory-resolve-leadingWildcard-tp3335240p3335307.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH delta last_index_time

2011-09-14 Thread Rahul Warawdekar
Hi Maria/Gora,

I see this as more of a problem with the timezones in which the Solr server
and the database server are located.
Is this true ?
If yes, one more possibility of handling this scenario would be to customize
DataImportHandler code as follows

1. Add one more configuration property named dbTimeZone at the entity
level in data-config.xml file
2. While saving the lastIndexTime in the properties file, save it according
to the timezone specified in the config so that it is in sync with the
database
server time.

Basically customize the code so that all the time related updates to the
dataimport.properties file should be timezone specific.


On Wed, Sep 14, 2011 at 4:31 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez
 maria.vazq...@dexone.com wrote:
  Hi,
  How do you handle the situation where the time on the server running Solr
  doesn¹t match the time in the database?

 Firstly, why is that the case? NTP is pretty universal
 these days.

  I¹m using the last_index_time saved by Solr in the delta query checking
 it
  against lastModifiedDate field in the database but the times are not in
 sync
  so I might lose some changes.
  Can we use something else other than last_index_time? Maybe something
 like
  last_pk or something.

 One possible way is to edit dataimport.properties, manually or through
 a script, to put the last_index_time back to a safe value.

 Regards,
 Gora




-- 
Thanks and Regards
Rahul A. Warawdekar


query - part default OR and part default AND

2011-09-14 Thread Omri Cohen
Hi All,

I have two fields in my schema: field1, field2 , for the sake of the example
I'll define to phrases:
phrase1 - solr is the best fts ever
phrase2 - let us all contribute to open source for a better world

now I want to perform the next query:

field1:( phrase1) AND field2:(phrase2)

my default operator is AND, but I want to search within field1 with AND
operator between the tokens and within field2 with OR operator.
what i already tried is to split phrase1 by whitespaces, changing the
default search operator to OR in the schema and add + signs before each
word:

field1:(+solr +is +the +best +fts +ever) AND field2:(let us all contribute
to open source for a better world)

this query is not good.. because when I am splitting phrase1 it is not how
the index time tokenizer splits it... so I am not getting the results I
would like to...

any idea any one?

Thanks,

Omri


NewSolrCloudDesign question

2011-09-14 Thread darren

Hi,
  I am very excited to see this direction for Solr. I realize its early
still,
but is there any thought as to what the target release date might be (this
year? next?).

Also, will the new solr cloud support all query types including all forms
of faceting,
distributed IDF, ranging, sorting, paging etc? 

Thanks!
Darren


Performance troubles with solr

2011-09-14 Thread Yusuf Karakaya
Hi, i'm having performance troubles with solr. I don't know if i'm expection
too much from solr or i missconfigured solr.
When i run a single query its QTime is 500-1000~ ms (without any use of
caches).
When i run my test script (with use of caches) QTime increases
exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases to
%550~

My solr-start script:
java -Duser.timezone=EET -Xmx6000m -jar ./start.jar

2,000,000~ documents ,  currently there aren't any commits but in future
there will be 5,000~ updates/additions to documents every 3-5~   min via
delta import.

Search Query
sort=userscore+desc
start=0
q=photo_id:* AND gender:true AND country:MALAWI AND online:false
fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
rows=150

Schema

field name=id type=long indexed=true stored=true required=true/
field name=username type=string indexed=true stored=false
required=true/
field name=namesurname type=string indexed=true stored=false/
field name=network type=int indexed=true stored=false/
field name=photo_id type=int indexed=true stored=false/
field name=gender type=boolean indexed=true stored=false/
field name=country type=string indexed=true stored=false/
field name=birth type=tdate indexed=true stored=false/
field name=lastlogin type=tdate indexed=true stored=false/
field name=online type=boolean indexed=true stored=false/
field name=userscore type=int indexed=true stored=false/

Cache Sizes  Lazy Load

filterCache class=solr.FastLRUCache size=16384 initialSize=4096
autowarmCount=4096/
queryResultCache class=solr.LRUCache size=16384 initialSize=4096
autowarmCount=4096/
documentCache class=solr.LRUCache size=16384 initialSize=4096
autowarmCount=4096/
enableLazyFieldLoadingtrue/enableLazyFieldLoading


Re: Running solr on small amounts of RAM

2011-09-14 Thread Mike Austin
Just wanted to follow up and say thanks for all the valuable replies.  I'm
in the process of testing everything.

Thanks,
Mike

On Mon, Sep 12, 2011 at 1:20 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 Beyond the suggestions already made, i would add:

 a) being really aggressive about stop words can help keep the index size
 down, which can help reduce the amount of memory needed to scan the term
 lists

 b) faceting w/o any caching is likelye going to be too slow to be
 acceptible.

 c) don't sort on anything except score.

 -Hoss



RE: EofException with Solr in Jetty

2011-09-14 Thread Jaeger, Jay - DOT
I have not used SolrJ, but it probably is worth considering as a possible 
suspect.

Also, do you have anything in between the client and the Solr server (a 
firewall, load balancer, etc.?) that might play games with HTTP connections?

You might want to start up a network trace on the server or network to see if 
you can catch one to see what is going on.

I looked at our Solr 3.1 prototype log (which has been running continuously 
without interruption since July 10!), and did not see any of these errors.  We 
do not use SolrJ -- we use a combination of plain old HTTP/javascript/xslt and 
requests coming from another system as a (plain old XML) web service to get to 
Solr.

However, that is under Jetty 6.

JRJ

-Original Message-
From: Michael Szalay [mailto:michael.sza...@basis06.ch] 
Sent: Wednesday, September 14, 2011 8:27 AM
To: solr-user@lucene.apache.org
Subject: Re: EofException with Solr in Jetty

We are using SolrJ 3.1 as our http client...
So it may be a bug in there?

Regards Michael

-- 
Michael Szalay
Senior Software Engineer

basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
http://www.basis06.ch - source of smart business 

- Ursprüngliche Mail -
Von: Jay Jaeger - DOT jay.jae...@dot.wi.gov
An: solr-user@lucene.apache.org, JETTY user mailing list 
jetty-us...@eclipse.org
Gesendet: Mittwoch, 14. September 2011 15:21:19
Betreff: RE: EofException with Solr in Jetty

Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks 
like this:

if (_closed)
throw new IOException(Closed);  

[http://www.jarvana.com/jarvana/view/org/eclipse/jetty/aggregate/jetty-all/7.1.0.RC0/jetty-all-7.1.0.RC0-sources.jar!/org/eclipse/jetty/server/HttpOutput.java?format=ok
 -- which may or may not match exactly, but I doubt that this code changes all 
that often.]

I would read this as Jetty thinking that this HTTP connection is closed.

It this perhaps a case of your HTTP client disconnecting (or crashing) before 
Jetty can get the entire message (HTTP response) sent?

(The other alternative that occurs to me would be that Solr told Jetty the 
response was all done, but then turned around and tried to send more in the 
response).

-Original Message-
From: Michael Szalay [mailto:michael.sza...@basis06.ch] 
Sent: Wednesday, September 14, 2011 1:47 AM
To: solr-user@lucene.apache.org; JETTY user mailing list
Subject: EofException with Solr in Jetty

Hi all

sometimes we have this error in our system. We are running Solr 3.1.0 running 
on Jetty 7.2.2

Anyone an idea how to tune this?

14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | 
apache.solr.common.SolrException 151 | 154 - 
mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 
0 | org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96)
at 
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at 
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at 
org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:184)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:864)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.ops4j.pax.web.service.jetty.internal.JettyServerHandlerCollection.handle(JettyServerHandlerCollection.java:72)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114)
at org.eclipse.jetty.server.Server.handle(Server.java:352)
at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
at 

Re: NewSolrCloudDesign question

2011-09-14 Thread Yonik Seeley
On Wed, Sep 14, 2011 at 10:17 AM,  dar...@ontrenet.com wrote:

 Hi,
  I am very excited to see this direction for Solr. I realize its early
 still,
 but is there any thought as to what the target release date might be (this
 year? next?).

We've started to work on the new functionallity now, but an official
release would be whenever Lucene/Solr 4.0 is released ;-)

 Also, will the new solr cloud support all query types including all forms
 of faceting,
 distributed IDF, ranging, sorting, paging etc?

Yes, it will build off the current distributed search.  We still need
to implement distributed IDF, but that shouldn't be too hard.

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference


Re: NewSolrCloudDesign question

2011-09-14 Thread darren

Thank you. Should be awesome when its ready!

On Wed, 14 Sep 2011 10:25:26 -0400, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Wed, Sep 14, 2011 at 10:17 AM,  dar...@ontrenet.com wrote:

 Hi,
  I am very excited to see this direction for Solr. I realize its early
 still,
 but is there any thought as to what the target release date might be
 (this
 year? next?).
 
 We've started to work on the new functionallity now, but an official
 release would be whenever Lucene/Solr 4.0 is released ;-)
 
 Also, will the new solr cloud support all query types including all
forms
 of faceting,
 distributed IDF, ranging, sorting, paging etc?
 
 Yes, it will build off the current distributed search.  We still need
 to implement distributed IDF, but that shouldn't be too hard.
 
 -Yonik
 http://www.lucene-eurocon.com - The Lucene/Solr User Conference


RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
I think folks are going to need a *lot* more information.  Particularly

1.  Just what does your test script do?   Is it doing updates, or just 
queries of the sort you mentioned below?  
2.  If the test script is doing updates, how are those updates being fed to 
Solr?  
3.  What version of Solr are you running?
4.  Why did you increase the default for jetty (around 384m) to 6000m, 
particularly given your relatively modest number of documents (2,000,000).
5.  Machine characteristics, particularly operating system and physical memory 
on the machine.

Please refer to http://wiki.apache.org/solr/UsingMailingLists for additional 
guidance in using the mailing list to get help.

-Original Message-
From: Yusuf Karakaya [mailto:karakaya...@gmail.com] 
Sent: Wednesday, September 14, 2011 9:19 AM
To: solr-user@lucene.apache.org
Subject: Performance troubles with solr

Hi, i'm having performance troubles with solr. I don't know if i'm expection
too much from solr or i missconfigured solr.
When i run a single query its QTime is 500-1000~ ms (without any use of
caches).
When i run my test script (with use of caches) QTime increases
exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases to
%550~

My solr-start script:
java -Duser.timezone=EET -Xmx6000m -jar ./start.jar

2,000,000~ documents ,  currently there aren't any commits but in future
there will be 5,000~ updates/additions to documents every 3-5~   min via
delta import.

Search Query
sort=userscore+desc
start=0
q=photo_id:* AND gender:true AND country:MALAWI AND online:false
fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
rows=150

Schema

field name=id type=long indexed=true stored=true required=true/
field name=username type=string indexed=true stored=false
required=true/
field name=namesurname type=string indexed=true stored=false/
field name=network type=int indexed=true stored=false/
field name=photo_id type=int indexed=true stored=false/
field name=gender type=boolean indexed=true stored=false/
field name=country type=string indexed=true stored=false/
field name=birth type=tdate indexed=true stored=false/
field name=lastlogin type=tdate indexed=true stored=false/
field name=online type=boolean indexed=true stored=false/
field name=userscore type=int indexed=true stored=false/

Cache Sizes  Lazy Load

filterCache class=solr.FastLRUCache size=16384 initialSize=4096
autowarmCount=4096/
queryResultCache class=solr.LRUCache size=16384 initialSize=4096
autowarmCount=4096/
documentCache class=solr.LRUCache size=16384 initialSize=4096
autowarmCount=4096/
enableLazyFieldLoadingtrue/enableLazyFieldLoading


Re: EofException with Solr in Jetty

2011-09-14 Thread Michael Szalay
There is nothing between the client app and the solr server, its on the same 
machine and on the same app server, only going through the loopback interface.
Unfortunatly, I cannot reproduce it, but I see it in the server log.

Thanks
Michael

--
Michael Szalay
Senior Software Engineer

basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
http://www.basis06.ch - source of smart business

- Ursprüngliche Mail -
Von: Jay Jaeger - DOT jay.jae...@dot.wi.gov
An: solr-user@lucene.apache.org
Gesendet: Mittwoch, 14. September 2011 16:23:45
Betreff: RE: EofException with Solr in Jetty

I have not used SolrJ, but it probably is worth considering as a possible 
suspect.

Also, do you have anything in between the client and the Solr server (a 
firewall, load balancer, etc.?) that might play games with HTTP connections?

You might want to start up a network trace on the server or network to see if 
you can catch one to see what is going on.

I looked at our Solr 3.1 prototype log (which has been running continuously 
without interruption since July 10!), and did not see any of these errors.  We 
do not use SolrJ -- we use a combination of plain old HTTP/javascript/xslt and 
requests coming from another system as a (plain old XML) web service to get to 
Solr.

However, that is under Jetty 6.

JRJ

-Original Message-
From: Michael Szalay [mailto:michael.sza...@basis06.ch]
Sent: Wednesday, September 14, 2011 8:27 AM
To: solr-user@lucene.apache.org
Subject: Re: EofException with Solr in Jetty

We are using SolrJ 3.1 as our http client...
So it may be a bug in there?

Regards Michael

--
Michael Szalay
Senior Software Engineer

basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22
http://www.basis06.ch - source of smart business

- Ursprüngliche Mail -
Von: Jay Jaeger - DOT jay.jae...@dot.wi.gov
An: solr-user@lucene.apache.org, JETTY user mailing list 
jetty-us...@eclipse.org
Gesendet: Mittwoch, 14. September 2011 15:21:19
Betreff: RE: EofException with Solr in Jetty

Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks 
like this:

if (_closed)
throw new IOException(Closed);  

[http://www.jarvana.com/jarvana/view/org/eclipse/jetty/aggregate/jetty-all/7.1.0.RC0/jetty-all-7.1.0.RC0-sources.jar!/org/eclipse/jetty/server/HttpOutput.java?format=ok
 -- which may or may not match exactly, but I doubt that this code changes all 
that often.]

I would read this as Jetty thinking that this HTTP connection is closed.

It this perhaps a case of your HTTP client disconnecting (or crashing) before 
Jetty can get the entire message (HTTP response) sent?

(The other alternative that occurs to me would be that Solr told Jetty the 
response was all done, but then turned around and tried to send more in the 
response).

-Original Message-
From: Michael Szalay [mailto:michael.sza...@basis06.ch]
Sent: Wednesday, September 14, 2011 1:47 AM
To: solr-user@lucene.apache.org; JETTY user mailing list
Subject: EofException with Solr in Jetty

Hi all

sometimes we have this error in our system. We are running Solr 3.1.0 running 
on Jetty 7.2.2

Anyone an idea how to tune this?

14:41:05,693 | ERROR | qtp283504850-36 | SolrDispatchFilter | 
apache.solr.common.SolrException 151 | 154 - 
mvn_ch.basis06.eld.indexer_ch.basis06.eld.indexer.solrserver_0.1-SNAPSHOT_war - 
0 | org.eclipse.jetty.io.EofException
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:149)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:96)
at 
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184)
at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
at 
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46)
at 
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at 
org.ops4j.pax.web.service.internal.WelcomeFilesFilter.doFilter(WelcomeFilesFilter.java:169)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1322)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:473)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:516)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:929)
at 
org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:116)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:403)
at 

glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler

2011-09-14 Thread Xue-Feng Yang
Hi all,


I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and eclipse 
indigo. I have the following error:

SEVERE: org.apache.solr.common.SolrException: Error loading class 
'org.apache.solr.handler.dataimport.DataImportHandler'

However, I have a line

  lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar /

in solrconfig.xml and dist is under the solr.solr.home directory. 

If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext
then this error will not appear. But this should not be right way.

Any thought?

Thanks.

Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler

2011-09-14 Thread darren

Here's a thought.

If dist is under solr.solr.home but your lib dir is set to be
../../dist.
Wouldn't the lib dir be relative to solr.solr.home and therefore should
just be dist?

On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang
just4l...@yahoo.com wrote:
 Hi all,
 
 
 I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and
eclipse
 indigo. I have the following error:
 
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.dataimport.DataImportHandler'
 
 However, I have a line
 
   lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar
 /
 
 in solrconfig.xml and dist is under the solr.solr.home directory. 
 
 If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext
 then this error will not appear. But this should not be right way.
 
 Any thought?
 
 Thanks.


Re: Performance troubles with solr

2011-09-14 Thread Yusuf Karakaya
Thank you for your reply.
I tried to give most of the information i can but obviously i missed some.
1.  Just what does your test script do?   Is it doing updates, or just
queries of the sort you mentioned below?
Test script only sends random queries.
2.  If the test script is doing updates, how are those updates being fed to
Solr?
There are no updates right now, as i failed on performance.
3.  What version of Solr are you running?
I'm using Solr 3.3.0
4.  Why did you increase the default for jetty (around 384m) to 6000m,
particularly given your relatively modest number of documents (2,000,000).
I was trying everything before asking here.
5.  Machine characteristics, particularly operating system and physical
memory on the machine.
OS = Debian 6.0,  Physcal Memory = 32 gb, CPU = 2x Intel Quad Core

On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote:

 I think folks are going to need a *lot* more information.  Particularly

 1.  Just what does your test script do?   Is it doing updates, or just
 queries of the sort you mentioned below?
 2.  If the test script is doing updates, how are those updates being fed to
 Solr?
 3.  What version of Solr are you running?
 4.  Why did you increase the default for jetty (around 384m) to 6000m,
 particularly given your relatively modest number of documents (2,000,000).
 5.  Machine characteristics, particularly operating system and physical
 memory on the machine.

 Please refer to http://wiki.apache.org/solr/UsingMailingLists for
 additional guidance in using the mailing list to get help.

 -Original Message-
 From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
 Sent: Wednesday, September 14, 2011 9:19 AM
 To: solr-user@lucene.apache.org
 Subject: Performance troubles with solr

 Hi, i'm having performance troubles with solr. I don't know if i'm
 expection
 too much from solr or i missconfigured solr.
 When i run a single query its QTime is 500-1000~ ms (without any use of
 caches).
 When i run my test script (with use of caches) QTime increases
 exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases
 to
 %550~

 My solr-start script:
 java -Duser.timezone=EET -Xmx6000m -jar ./start.jar

 2,000,000~ documents ,  currently there aren't any commits but in future
 there will be 5,000~ updates/additions to documents every 3-5~   min via
 delta import.

 Search Query
 sort=userscore+desc
 start=0
 q=photo_id:* AND gender:true AND country:MALAWI AND online:false
 fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
 fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
 NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
 fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
 rows=150

 Schema

 field name=id type=long indexed=true stored=true required=true/
 field name=username type=string indexed=true stored=false
 required=true/
 field name=namesurname type=string indexed=true stored=false/
 field name=network type=int indexed=true stored=false/
 field name=photo_id type=int indexed=true stored=false/
 field name=gender type=boolean indexed=true stored=false/
 field name=country type=string indexed=true stored=false/
 field name=birth type=tdate indexed=true stored=false/
 field name=lastlogin type=tdate indexed=true stored=false/
 field name=online type=boolean indexed=true stored=false/
 field name=userscore type=int indexed=true stored=false/

 Cache Sizes  Lazy Load

 filterCache class=solr.FastLRUCache size=16384 initialSize=4096
 autowarmCount=4096/
 queryResultCache class=solr.LRUCache size=16384 initialSize=4096
 autowarmCount=4096/
 documentCache class=solr.LRUCache size=16384 initialSize=4096
 autowarmCount=4096/
 enableLazyFieldLoadingtrue/enableLazyFieldLoading



Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler

2011-09-14 Thread Xue-Feng Yang
Thanks for a quick reply. 


I just tested as you suggested. The error is still there.

The setup line

lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / 


is actually coming with solr 3.3 release not by me.





From: dar...@ontrenet.com dar...@ontrenet.com
To: Xue-Feng Yang just4l...@yahoo.com
Cc: solr-user@lucene.apache.org
Sent: Wednesday, September 14, 2011 10:52:55 AM
Subject: Re: glassfish, solrconfig.xml  and SolrException: Error loading 
DataImportHandler


Here's a thought.

If dist is under solr.solr.home but your lib dir is set to be
../../dist.
Wouldn't the lib dir be relative to solr.solr.home and therefore should
just be dist?

On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang
just4l...@yahoo.com wrote:
 Hi all,
 
 
 I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and
eclipse
 indigo. I have the following error:
 
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.dataimport.DataImportHandler'
 
 However, I have a line
 
   lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar
 /
 
 in solrconfig.xml and dist is under the solr.solr.home directory. 
 
 If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext
 then this error will not appear. But this should not be right way.
 
 Any thought?
 
 Thanks.

RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
I don't have enough experience with filter queries to advise well on when to 
use fq vs. putting it in the query itself, but I do know that we are not using 
filter queries, and with index sizes ranging from 7 Million to 27+ Million we 
have not seen this kind of issue.

Maybe keeping 16,384 filter queries around, particularly caching the ones with 
random age ranges is eating your memory up -- so perhaps try moving just that 
particular fq into q instead (since it is random) and just cache the ones 
where the number of options is limited?

What happens if you try your test without the filter queries?  What happens if 
you put the additional criteria that are in your filter query into the query 
itself?

JRJ

-Original Message-
From: Yusuf Karakaya [mailto:karakaya...@gmail.com] 
Sent: Wednesday, September 14, 2011 9:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Performance troubles with solr

Thank you for your reply.
I tried to give most of the information i can but obviously i missed some.
1.  Just what does your test script do?   Is it doing updates, or just
queries of the sort you mentioned below?
Test script only sends random queries.
2.  If the test script is doing updates, how are those updates being fed to
Solr?
There are no updates right now, as i failed on performance.
3.  What version of Solr are you running?
I'm using Solr 3.3.0
4.  Why did you increase the default for jetty (around 384m) to 6000m,
particularly given your relatively modest number of documents (2,000,000).
I was trying everything before asking here.
5.  Machine characteristics, particularly operating system and physical
memory on the machine.
OS = Debian 6.0,  Physcal Memory = 32 gb, CPU = 2x Intel Quad Core

On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote:

 I think folks are going to need a *lot* more information.  Particularly

 1.  Just what does your test script do?   Is it doing updates, or just
 queries of the sort you mentioned below?
 2.  If the test script is doing updates, how are those updates being fed to
 Solr?
 3.  What version of Solr are you running?
 4.  Why did you increase the default for jetty (around 384m) to 6000m,
 particularly given your relatively modest number of documents (2,000,000).
 5.  Machine characteristics, particularly operating system and physical
 memory on the machine.

 Please refer to http://wiki.apache.org/solr/UsingMailingLists for
 additional guidance in using the mailing list to get help.

 -Original Message-
 From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
 Sent: Wednesday, September 14, 2011 9:19 AM
 To: solr-user@lucene.apache.org
 Subject: Performance troubles with solr

 Hi, i'm having performance troubles with solr. I don't know if i'm
 expection
 too much from solr or i missconfigured solr.
 When i run a single query its QTime is 500-1000~ ms (without any use of
 caches).
 When i run my test script (with use of caches) QTime increases
 exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases
 to
 %550~

 My solr-start script:
 java -Duser.timezone=EET -Xmx6000m -jar ./start.jar

 2,000,000~ documents ,  currently there aren't any commits but in future
 there will be 5,000~ updates/additions to documents every 3-5~   min via
 delta import.

 Search Query
 sort=userscore+desc
 start=0
 q=photo_id:* AND gender:true AND country:MALAWI AND online:false
 fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
 fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
 NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
 fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
 rows=150

 Schema

 field name=id type=long indexed=true stored=true required=true/
 field name=username type=string indexed=true stored=false
 required=true/
 field name=namesurname type=string indexed=true stored=false/
 field name=network type=int indexed=true stored=false/
 field name=photo_id type=int indexed=true stored=false/
 field name=gender type=boolean indexed=true stored=false/
 field name=country type=string indexed=true stored=false/
 field name=birth type=tdate indexed=true stored=false/
 field name=lastlogin type=tdate indexed=true stored=false/
 field name=online type=boolean indexed=true stored=false/
 field name=userscore type=int indexed=true stored=false/

 Cache Sizes  Lazy Load

 filterCache class=solr.FastLRUCache size=16384 initialSize=4096
 autowarmCount=4096/
 queryResultCache class=solr.LRUCache size=16384 initialSize=4096
 autowarmCount=4096/
 documentCache class=solr.LRUCache size=16384 initialSize=4096
 autowarmCount=4096/
 enableLazyFieldLoadingtrue/enableLazyFieldLoading



RE: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler

2011-09-14 Thread Jaeger, Jay - DOT
Some things to think about:

When solr starts up, solr should report for the location of solr home.  Is it 
what you expect?
Is there any security on the dist directory that would prevent solr from 
accessing it?
Is there a classloader policy set on glassfish that could be getting in the way?

(your testing seems to eliminate the possibility of a JRE incompatibility)

JRJ

-Original Message-
From: Xue-Feng Yang [mailto:just4l...@yahoo.com] 
Sent: Wednesday, September 14, 2011 10:07 AM
To: solr-user@lucene.apache.org
Subject: Re: glassfish, solrconfig.xml and SolrException: Error loading 
DataImportHandler

Thanks for a quick reply. 


I just tested as you suggested. The error is still there.

The setup line

lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / 


is actually coming with solr 3.3 release not by me.





From: dar...@ontrenet.com dar...@ontrenet.com
To: Xue-Feng Yang just4l...@yahoo.com
Cc: solr-user@lucene.apache.org
Sent: Wednesday, September 14, 2011 10:52:55 AM
Subject: Re: glassfish, solrconfig.xml  and SolrException: Error loading 
DataImportHandler


Here's a thought.

If dist is under solr.solr.home but your lib dir is set to be
../../dist.
Wouldn't the lib dir be relative to solr.solr.home and therefore should
just be dist?

On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang
just4l...@yahoo.com wrote:
 Hi all,
 
 
 I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and
eclipse
 indigo. I have the following error:
 
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.dataimport.DataImportHandler'
 
 However, I have a line
 
   lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar
 /
 
 in solrconfig.xml and dist is under the solr.solr.home directory. 
 
 If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext
 then this error will not appear. But this should not be right way.
 
 Any thought?
 
 Thanks.


Re: query - part default OR and part default AND

2011-09-14 Thread tamanjit.bin...@yahoo.co.in
Keep the default Search Operator as OR
And for phrase1, on splitting on whitespace just add AND instead of +.

Hopefully this should work. Please do confirm.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/query-part-default-OR-and-part-default-AND-tp3335851p3336194.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH delta last_index_time

2011-09-14 Thread tamanjit.bin...@yahoo.co.in
Rahul is right.

You may add a script to change the date in data-import.properties to half
hour before the last modi time before each delta-import.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-delta-last-index-time-tp3334992p3336203.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler

2011-09-14 Thread Xue-Feng Yang
Thanks for your reply.

Actually, some of the cores are working perfectly. So it's not the 
solr.solr.home problem.





From: Jaeger, Jay - DOT jay.jae...@dot.wi.gov
To: solr-user@lucene.apache.org solr-user@lucene.apache.org; 'Xue-Feng 
Yang' just4l...@yahoo.com
Sent: Wednesday, September 14, 2011 11:21:18 AM
Subject: RE: glassfish, solrconfig.xml  and SolrException: Error loading 
DataImportHandler

Some things to think about:

When solr starts up, solr should report for the location of solr home.  Is it 
what you expect?
Is there any security on the dist directory that would prevent solr from 
accessing it?
Is there a classloader policy set on glassfish that could be getting in the way?

(your testing seems to eliminate the possibility of a JRE incompatibility)

JRJ

-Original Message-
From: Xue-Feng Yang [mailto:just4l...@yahoo.com] 
Sent: Wednesday, September 14, 2011 10:07 AM
To: solr-user@lucene.apache.org
Subject: Re: glassfish, solrconfig.xml and SolrException: Error loading 
DataImportHandler

Thanks for a quick reply. 


I just tested as you suggested. The error is still there.

The setup line

lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar / 


is actually coming with solr 3.3 release not by me.





From: dar...@ontrenet.com dar...@ontrenet.com
To: Xue-Feng Yang just4l...@yahoo.com
Cc: solr-user@lucene.apache.org
Sent: Wednesday, September 14, 2011 10:52:55 AM
Subject: Re: glassfish, solrconfig.xml  and SolrException: Error loading 
DataImportHandler


Here's a thought.

If dist is under solr.solr.home but your lib dir is set to be
../../dist.
Wouldn't the lib dir be relative to solr.solr.home and therefore should
just be dist?

On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang
just4l...@yahoo.com wrote:
 Hi all,
 
 
 I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and
eclipse
 indigo. I have the following error:
 
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.dataimport.DataImportHandler'
 
 However, I have a line
 
   lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar
 /
 
 in solrconfig.xml and dist is under the solr.solr.home directory. 
 
 If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext
 then this error will not appear. But this should not be right way.
 
 Any thought?
 
 Thanks.

Re: BigDecimal data type

2011-09-14 Thread Chris Hostetter

: Is there a way to use BigDecimal as a data type in solr? I am using solr
: 3.3.

if you just want to *store* BigDecimals in a solr index, then just use 
StrField with the canonical representation -- but if you want to sort or 
do range queries on the values, then no.

Given that BigDecimal values allow for an arbitrary-precision unscaled 
value, i don't know if it would even be possible to BigDecimal values in a 
way that would allow arbitrary values to sort properly as Terms.

-Hoss


Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler

2011-09-14 Thread Xue-Feng Yang
After making another try, I found it worked with

  lib dir=../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar /


I leave this to here in case someone may need this too.

Thanks




From: dar...@ontrenet.com dar...@ontrenet.com
To: Xue-Feng Yang just4l...@yahoo.com
Cc: solr-user@lucene.apache.org
Sent: Wednesday, September 14, 2011 10:52:55 AM
Subject: Re: glassfish, solrconfig.xml  and SolrException: Error loading 
DataImportHandler


Here's a thought.

If dist is under solr.solr.home but your lib dir is set to be
../../dist.
Wouldn't the lib dir be relative to solr.solr.home and therefore should
just be dist?

On Wed, 14 Sep 2011 07:45:45 -0700 (PDT), Xue-Feng Yang
just4l...@yahoo.com wrote:
 Hi all,
 
 
 I am trying set up solr 3.3 with multicores in glassfish 3.1.1 and
eclipse
 indigo. I have the following error:
 
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.apache.solr.handler.dataimport.DataImportHandler'
 
 However, I have a line
 
   lib dir=../../dist/ regex=apache-solr-dataimporthandler-\d.*\.jar
 /
 
 in solrconfig.xml and dist is under the solr.solr.home directory. 
 
 If I copy the jars to glassfish_root/glassfish/domains/domain1/lib/ext
 then this error will not appear. But this should not be right way.
 
 Any thought?
 
 Thanks.

Re: Schema fieldType y-m-d ?!?!

2011-09-14 Thread tamanjit.bin...@yahoo.co.in
What we did was get the date from db, and stored it in a string fieldType in
the format mmdd. It works fine for us, as range query works just fine.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-fieldType-y-m-d-tp3335359p3336309.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH delta last_index_time

2011-09-14 Thread Vazquez, Maria (STM)
Thanks Rahul
That sounds like a good solution, I will change the code to support different 
timezones. Maybe this could be included in next release of Solr since a few 
people mentioned this problem too.
Thanks again
Maria


Sent from my Motorola ATRIX™ 4G on ATT

-Original message-
From: Rahul Warawdekar rahul.warawde...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, Sep 14, 2011 14:01:08 GMT+00:00
Subject: Re: DIH delta last_index_time

Hi Maria/Gora,

I see this as more of a problem with the timezones in which the Solr server
and the database server are located.
Is this true ?
If yes, one more possibility of handling this scenario would be to customize
DataImportHandler code as follows

1. Add one more configuration property named dbTimeZone at the entity
level in data-config.xml file
2. While saving the lastIndexTime in the properties file, save it according
to the timezone specified in the config so that it is in sync with the
database
server time.

Basically customize the code so that all the time related updates to the
dataimport.properties file should be timezone specific.


On Wed, Sep 14, 2011 at 4:31 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez
 maria.vazq...@dexone.com wrote:
  Hi,
  How do you handle the situation where the time on the server running Solr
  doesn¹t match the time in the database?

 Firstly, why is that the case? NTP is pretty universal
 these days.

  I¹m using the last_index_time saved by Solr in the delta query checking
 it
  against lastModifiedDate field in the database but the times are not in
 sync
  so I might lose some changes.
  Can we use something else other than last_index_time? Maybe something
 like
  last_pk or something.

 One possible way is to edit dataimport.properties, manually or through
 a script, to put the last_index_time back to a safe value.

 Regards,
 Gora




-- 
Thanks and Regards
Rahul A. Warawdekar



Re: Performance troubles with solr

2011-09-14 Thread Yusuf Karakaya
I  tried moving age query from filter query to normal query but nothing
really changed.
But when i try to move everything into query itself ( removed all filter
queries) QTimes slowed much more.
I don't have problem with memory or cpu usage, my problem is query response
times.
When i send only one query respond times vary from 500 ms to 1000 ms (non
cached) and its too much.
When i send a set of random queries (10-20 queries per second) response
times goes crayz ( 8 seconds to 60+ seconds).

On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote:

 I don't have enough experience with filter queries to advise well on when
 to use fq vs. putting it in the query itself, but I do know that we are not
 using filter queries, and with index sizes ranging from 7 Million to 27+
 Million we have not seen this kind of issue.

 Maybe keeping 16,384 filter queries around, particularly caching the ones
 with random age ranges is eating your memory up -- so perhaps try moving
 just that particular fq into q instead (since it is random) and just cache
 the ones where the number of options is limited?

 What happens if you try your test without the filter queries?  What happens
 if you put the additional criteria that are in your filter query into the
 query itself?

 JRJ

 -Original Message-
 From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
 Sent: Wednesday, September 14, 2011 9:54 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Performance troubles with solr

 Thank you for your reply.
 I tried to give most of the information i can but obviously i missed some.
 1.  Just what does your test script do?   Is it doing updates, or just
 queries of the sort you mentioned below?
 Test script only sends random queries.
 2.  If the test script is doing updates, how are those updates being fed to
 Solr?
 There are no updates right now, as i failed on performance.
 3.  What version of Solr are you running?
 I'm using Solr 3.3.0
 4.  Why did you increase the default for jetty (around 384m) to 6000m,
 particularly given your relatively modest number of documents (2,000,000).
 I was trying everything before asking here.
 5.  Machine characteristics, particularly operating system and physical
 memory on the machine.
 OS = Debian 6.0,  Physcal Memory = 32 gb, CPU = 2x Intel Quad Core

 On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.gov
 wrote:

  I think folks are going to need a *lot* more information.  Particularly
 
  1.  Just what does your test script do?   Is it doing updates, or just
  queries of the sort you mentioned below?
  2.  If the test script is doing updates, how are those updates being fed
 to
  Solr?
  3.  What version of Solr are you running?
  4.  Why did you increase the default for jetty (around 384m) to 6000m,
  particularly given your relatively modest number of documents
 (2,000,000).
  5.  Machine characteristics, particularly operating system and physical
  memory on the machine.
 
  Please refer to http://wiki.apache.org/solr/UsingMailingLists for
  additional guidance in using the mailing list to get help.
 
  -Original Message-
  From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
  Sent: Wednesday, September 14, 2011 9:19 AM
  To: solr-user@lucene.apache.org
  Subject: Performance troubles with solr
 
  Hi, i'm having performance troubles with solr. I don't know if i'm
  expection
  too much from solr or i missconfigured solr.
  When i run a single query its QTime is 500-1000~ ms (without any use of
  caches).
  When i run my test script (with use of caches) QTime increases
  exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases
  to
  %550~
 
  My solr-start script:
  java -Duser.timezone=EET -Xmx6000m -jar ./start.jar
 
  2,000,000~ documents ,  currently there aren't any commits but in future
  there will be 5,000~ updates/additions to documents every 3-5~   min via
  delta import.
 
  Search Query
  sort=userscore+desc
  start=0
  q=photo_id:* AND gender:true AND country:MALAWI AND online:false
  fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
  fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
  NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
  fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
  rows=150
 
  Schema
 
  field name=id type=long indexed=true stored=true
 required=true/
  field name=username type=string indexed=true stored=false
  required=true/
  field name=namesurname type=string indexed=true stored=false/
  field name=network type=int indexed=true stored=false/
  field name=photo_id type=int indexed=true stored=false/
  field name=gender type=boolean indexed=true stored=false/
  field name=country type=string indexed=true stored=false/
  field name=birth type=tdate indexed=true stored=false/
  field name=lastlogin type=tdate indexed=true stored=false/
  field name=online type=boolean indexed=true stored=false/
  field name=userscore type=int 

Re: Schema fieldType y-m-d ?!?!

2011-09-14 Thread Alexei Martchenko
If you don't need date-specific functions and/or faceting, you can store it
as a int, like 20110914 and parse it in your application

but I don't recommend... as a rule of thumb, dates should be stored as
dates, the millenium bug (Y2K bug) was all about 'saving some space'
remember?


Index not getting refreshed

2011-09-14 Thread Pawan Darira
Hi

I am using Solr 3.2 on a live website. i get live user's data of about 2000
per day. I do an incremental index every 8 hours. but my search results
always show the same result with same sorting order. when i check the same
search from corresponding db, it gives me different results always (as new
data regularly gets added)

please suggest what might be the issue. is there any cache related problem
at SOLR level

thanks
pawan


Re: Managing solr machines (start/stop/status)

2011-09-14 Thread josh lucas
On Sep 13, 2011, at 5:05 PM, Jamie Johnson wrote:

 I know this isn't a solr specific question but I was wondering what
 folks do in regards to managing the machines in their solr cluster?
 Are there any recommendations for how to start/stop/manage these
 machines?  Any suggestions would be appreciated.


One thing I use is csshx (http://code.google.com/p/csshx/) on my Mac when 
dealing with the various boxes in our cluster.  You can issue commands in one 
terminal and they are duplicated in all other windows.  Very useful for global 
stop/starts and updates.

Re: Index not getting refreshed

2011-09-14 Thread Rahul Warawdekar
Hi Pawan,

Can you please share more details on the indexing mechanism ? (DIH,  SolrJ
or any other)
Please let us know the configuration details.


On Wed, Sep 14, 2011 at 12:48 PM, Pawan Darira pawan.dar...@gmail.comwrote:

 Hi

 I am using Solr 3.2 on a live website. i get live user's data of about 2000
 per day. I do an incremental index every 8 hours. but my search results
 always show the same result with same sorting order. when i check the same
 search from corresponding db, it gives me different results always (as new
 data regularly gets added)

 please suggest what might be the issue. is there any cache related problem
 at SOLR level

 thanks
 pawan




-- 
Thanks and Regards
Rahul A. Warawdekar


RE: DIH delta last_index_time

2011-09-14 Thread Claudia Robles
The solution that I am currently using is converting the last_index_time to UTC 
before comparing to the LastModified field in the DB.  

LastModified  DATEADD(Hour, DATEDIFF(Hour, GETDATE(), GETUTCDATE()), 
'${dateimporter.last_index_time}')

This may be another option if the LastModified date in the DB is stored in UTC.

Regards,
Claudia
-Original Message-
From: Vazquez, Maria (STM) [mailto:maria.vazq...@dexone.com] 
Sent: Wednesday, September 14, 2011 9:27 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH delta last_index_time

Thanks Rahul
That sounds like a good solution, I will change the code to support different 
timezones. Maybe this could be included in next release of Solr since a few 
people mentioned this problem too.
Thanks again
Maria


Sent from my Motorola ATRIX™ 4G on ATT

-Original message-
From: Rahul Warawdekar rahul.warawde...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, Sep 14, 2011 14:01:08 GMT+00:00
Subject: Re: DIH delta last_index_time

Hi Maria/Gora,

I see this as more of a problem with the timezones in which the Solr server
and the database server are located.
Is this true ?
If yes, one more possibility of handling this scenario would be to customize
DataImportHandler code as follows

1. Add one more configuration property named dbTimeZone at the entity
level in data-config.xml file
2. While saving the lastIndexTime in the properties file, save it according
to the timezone specified in the config so that it is in sync with the
database
server time.

Basically customize the code so that all the time related updates to the
dataimport.properties file should be timezone specific.


On Wed, Sep 14, 2011 at 4:31 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Wed, Sep 14, 2011 at 11:23 AM, Maria Vazquez
 maria.vazq...@dexone.com wrote:
  Hi,
  How do you handle the situation where the time on the server running Solr
  doesn¹t match the time in the database?

 Firstly, why is that the case? NTP is pretty universal
 these days.

  I¹m using the last_index_time saved by Solr in the delta query checking
 it
  against lastModifiedDate field in the database but the times are not in
 sync
  so I might lose some changes.
  Can we use something else other than last_index_time? Maybe something
 like
  last_pk or something.

 One possible way is to edit dataimport.properties, manually or through
 a script, to put the last_index_time back to a safe value.

 Regards,
 Gora




-- 
Thanks and Regards
Rahul A. Warawdekar



how would I use the new join feature given my schema.

2011-09-14 Thread Jason Toy
I've been reading the information on the new join feature and am not quite
sure how I would use it given my schema structure. I have User docs and
BlogPost docs and I want to return all BlogPosts that match the fulltext
title cool that belong to Users that match the description solr.

Here are the 2 docs I have:


?xml version=1.0 encoding=UTF-8?add

docfield name=class_nameUser/fieldfield
name=login_sjtoy/fieldfield name=user_id_i192123/fieldfield
name=description_texta solr user/field/field/doc

docfield name=class_nameBlogPost/fieldfield
name=user_id_i192123/fieldfield name=body_textthis is the
description/fieldfield name=title_textthis is a cool
title/field/field/doc

/add?xml version=1.0 encoding=UTF-8?commit/


Is it possible to do this with the join functionality? If not, how would I
do this?

I'd appreciate any pointers or help on this.


Jason


Re: DIH delta last_index_time

2011-09-14 Thread Gora Mohanty
On Wed, Sep 14, 2011 at 9:56 PM, Vazquez, Maria (STM)
maria.vazq...@dexone.com wrote:
 Thanks Rahul
 That sounds like a good solution, I will change the code to support different 
 timezones. Maybe this could be included in next release of Solr since a few 
 people mentioned this problem too.
[...]

If it was indeed a timezone issue, Solr can hardly be automatically
aware of a difference in timezones. At least, IMHO, this is an
implementation issue, and a timezone difference should normally
be easy to diagnose.

Regards,
Gora


RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
How about this:  Start with just what you had in your query (q) without the 
filter queries.  Then add the fq's back in one at a time to see what is giving 
you problems -- leaving the birth filter query to the very last.

Others on the list more experienced with filter queries might have a more 
direct answer...

JRJ


-Original Message-
From: Yusuf Karakaya [mailto:karakaya...@gmail.com] 
Sent: Wednesday, September 14, 2011 11:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Performance troubles with solr

I  tried moving age query from filter query to normal query but nothing
really changed.
But when i try to move everything into query itself ( removed all filter
queries) QTimes slowed much more.
I don't have problem with memory or cpu usage, my problem is query response
times.
When i send only one query respond times vary from 500 ms to 1000 ms (non
cached) and its too much.
When i send a set of random queries (10-20 queries per second) response
times goes crayz ( 8 seconds to 60+ seconds).

On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote:

 I don't have enough experience with filter queries to advise well on when
 to use fq vs. putting it in the query itself, but I do know that we are not
 using filter queries, and with index sizes ranging from 7 Million to 27+
 Million we have not seen this kind of issue.

 Maybe keeping 16,384 filter queries around, particularly caching the ones
 with random age ranges is eating your memory up -- so perhaps try moving
 just that particular fq into q instead (since it is random) and just cache
 the ones where the number of options is limited?

 What happens if you try your test without the filter queries?  What happens
 if you put the additional criteria that are in your filter query into the
 query itself?

 JRJ

 -Original Message-
 From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
 Sent: Wednesday, September 14, 2011 9:54 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Performance troubles with solr

 Thank you for your reply.
 I tried to give most of the information i can but obviously i missed some.
 1.  Just what does your test script do?   Is it doing updates, or just
 queries of the sort you mentioned below?
 Test script only sends random queries.
 2.  If the test script is doing updates, how are those updates being fed to
 Solr?
 There are no updates right now, as i failed on performance.
 3.  What version of Solr are you running?
 I'm using Solr 3.3.0
 4.  Why did you increase the default for jetty (around 384m) to 6000m,
 particularly given your relatively modest number of documents (2,000,000).
 I was trying everything before asking here.
 5.  Machine characteristics, particularly operating system and physical
 memory on the machine.
 OS = Debian 6.0,  Physcal Memory = 32 gb, CPU = 2x Intel Quad Core

 On Wed, Sep 14, 2011 at 5:38 PM, Jaeger, Jay - DOT jay.jae...@dot.wi.gov
 wrote:

  I think folks are going to need a *lot* more information.  Particularly
 
  1.  Just what does your test script do?   Is it doing updates, or just
  queries of the sort you mentioned below?
  2.  If the test script is doing updates, how are those updates being fed
 to
  Solr?
  3.  What version of Solr are you running?
  4.  Why did you increase the default for jetty (around 384m) to 6000m,
  particularly given your relatively modest number of documents
 (2,000,000).
  5.  Machine characteristics, particularly operating system and physical
  memory on the machine.
 
  Please refer to http://wiki.apache.org/solr/UsingMailingLists for
  additional guidance in using the mailing list to get help.
 
  -Original Message-
  From: Yusuf Karakaya [mailto:karakaya...@gmail.com]
  Sent: Wednesday, September 14, 2011 9:19 AM
  To: solr-user@lucene.apache.org
  Subject: Performance troubles with solr
 
  Hi, i'm having performance troubles with solr. I don't know if i'm
  expection
  too much from solr or i missconfigured solr.
  When i run a single query its QTime is 500-1000~ ms (without any use of
  caches).
  When i run my test script (with use of caches) QTime increases
  exponentially, reaching 8000~ to 6~  ms. And Cpu usage also increases
  to
  %550~
 
  My solr-start script:
  java -Duser.timezone=EET -Xmx6000m -jar ./start.jar
 
  2,000,000~ documents ,  currently there aren't any commits but in future
  there will be 5,000~ updates/additions to documents every 3-5~   min via
  delta import.
 
  Search Query
  sort=userscore+desc
  start=0
  q=photo_id:* AND gender:true AND country:MALAWI AND online:false
  fq=birth:[NOW-31YEARS/DAY TO NOW-17YEARS/DAY]  ( Random age ranges )
  fq=lastlogin:[* TO NOW-6MONTHS/DAY] ( Only 2 options,   [* TO
  NOW-6MONTHS/DAY] or [NOW-6MONTHS/DAY TO *] )
  fq=userscore:[500 TO *]  ( Only 2 options, [500 TO *] or [* TO 500] )
  rows=150
 
  Schema
 
  field name=id type=long indexed=true stored=true
 required=true/
  field name=username type=string indexed=true stored=false
  

Re: Can index size increase when no updates/optimizes are happening?

2011-09-14 Thread Erick Erickson
What is the machine used for? Was your user looking at
a master? Slave? Something used for both?

Measuring the size of all the files in the index? Or looking
at memory?

The index files shouldn't be getting bigger unless there
were indexing operations going on. Is it at all possible that
DIH was configured to run automatically (or any other
indexing job for that matter) and your user didn't realize it?

Best
Erick

2011/9/13 Yury Kats yuryk...@yahoo.com:
 One of my users observed that the index size (in bytes)
 increased over night. There was no indexing activity
 at that time, only querying was taking place.

 Running optimize brought the index size back down to
 what it was when indexing finished the day before.

 What could explain that?




Re: select query does not find indexed pdf document

2011-09-14 Thread Erick Erickson
You can use copyField to put data from separate fields into a common
search field.

This page will help you get started on what mods you'd need to make on
a fieldType
to analyze it as you wish:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

But at a start think about WhitespaceTokenizer followed by
LowerCaseFilterFactory
AsciiFoldingFilterFactory
NGramFilterFactory


Pay attention to the note at the top that directs you to the full
list, the page above contains
a partial list. For instance, NGramFilterFactory isn't that page, it's
on the page that's linked
to: 
http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.html

Best
Erick

On Tue, Sep 13, 2011 at 10:46 PM, Michael Dockery
dockeryjava...@yahoo.com wrote:
 Thank you for your informative reply.

 I would like to start simple by combining both filename and content
   into the same default search field
    ...which my default schema xml calls  text
 ...
 defaultSearchFieldtext/defaultSearchField
 ...

 also:
 -case and accent insensitive
 -no splits on numb3rs
 -no highlights
 -text processing same for index and search

 however I do like
 -I like ngrams prerrably (partial/prefix word/token search)


 what schema mod's would be needed?

 also what curl syntax to submit/index a pdf (with filename and content 
 combined into the default search field)?



 
 From: Bob Sandiford bob.sandif...@sirsidynix.com
 To: Michael Dockery dockeryjava...@yahoo.com
 Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Monday, September 12, 2011 1:38 PM
 Subject: RE: select query does not find indexed pdf document

 Hi, Michael.

 Well, the stock answer is, 'it depends'

 For example - would you want to be able to search filename without searching 
 file contents, or would you always search both of them together?  If both, 
 then copy both the file name and the parsed file content from the pdf into a 
 single search field, and you can set that up as the default search field.

 Or - what kind of processing / normalizing do you want on this data?  Case 
 insensitive?  Accent insensitive?  If a 'word' contains camel case (e.g. 
 TheVeryIdea), do you want that split on the case changes?  (but then watch 
 out for things like iPad)  If a 'word' contains numbers, do want them left 
 together, or separated?  Do you want stemming (where searching for 'stemming' 
 would also find 'stem', 'stemmed', that sort of thing?)  Is this always 
 English, or are the other languages involved.  Do you want the text 
 processing to be the same for indexing vs searching?  Do you want to be able 
 to find hits based on the first few characters of a term?  (ngrams)

 Do you want to be able to highlight text segments where the search terms were 
 found?

 probably you want to read up on the various tokenizers and filters that are 
 available.  Do some prototyping and see how it looks.

 Here's a starting point: 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 Basically, there is no 'one size fits all' here.  Part of the power of Solr / 
 Lucene is its configurability to achieve the results your business case calls 
 for.  Part of the drawback of Solr / Lucene - especially for new folks - is 
 its configurability to achieve the results you business case calls for. :)

 Anyone got anything else to suggest for Michael?

 Bob Sandiford | Lead Software Engineer | SirsiDynix
 P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
 www.sirsidynix.comhttp://www.sirsidynix.com/

 From: Michael Dockery [mailto:dockeryjava...@yahoo.com]
 Sent: Monday, September 12, 2011 1:18 PM
 To: Bob Sandiford
 Subject: Re: select query does not find indexed pdf document

 thank you.  that worked.

 Any tips for   very   very  basic setup of the schema xml?
    or is the default basic enough?

 I basically only want to search search on
         filename   and    file contents


 From: Bob Sandiford bob.sandif...@sirsidynix.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org; Michael 
 Dockery dockeryjava...@yahoo.com
 Sent: Monday, September 12, 2011 10:04 AM
 Subject: RE: select query does not find indexed pdf document

 Um - looks like you specified your id value as pdfy, which is reflected in 
 the results from the *:* query, but your id query is searching for vpn, 
 hence no matches...

 What does this query yield?

 http://www/SearchApp/select/?q=id:pdfy

 Bob Sandiford | Lead Software Engineer | SirsiDynix
 P: 800.288.8020 X6943 | 
 bob.sandif...@sirsidynix.commailto:bob.sandif...@sirsidynix.com
 www.sirsidynix.com

 -Original Message-
 From: Michael Dockery 
 [mailto:dockeryjava...@yahoo.commailto:dockeryjava...@yahoo.com]
 Sent: Monday, September 12, 2011 9:56 AM
 To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
 Subject: Re: select query does not find indexed pdf document

 http://www/SearchApp/select/?q=id:vpn

 yeilds this:
   ?xml version=1.0 encoding=UTF-8 ?
 - response
 - 

RegexTransformer - need help with regex value

2011-09-14 Thread Pulkit Singhal
Hello,

Feel free to point me to alternate sources of information if you deem
this question unworthy of the Solr list :)

But until then please hear me out!

When my config is something like:
field column=imageUrl
   regex=.*img src=.(.*)\.gif..alt=.*
   sourceColName=description
   /
I don't get any data.

But when my config is like:
field column=imageUrl
   regex=.*img src=.(.*)..alt=.*
   sourceColName=description
   /
I get the following data as the value for imageUrl:
http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_.gif;
width=64

As the result shows, this is a string that should be able to match
even on the 1st regex=.*img src=.(.*)\.gif..alt=.* and produce a
result like:
http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_
But it doesn't!
Can anyone tell me why that would be the case?
Is it something about the way RegexTransformer is wired or is it just
my regex value that isn't right?


Re: query - part default OR and part default AND

2011-09-14 Thread Chris Hostetter

: phrase1 - solr is the best fts ever
: phrase2 - let us all contribute to open source for a better world
: 
: now I want to perform the next query:
: 
: field1:( phrase1) AND field2:(phrase2)
: 
: my default operator is AND, but I want to search within field1 with AND
: operator between the tokens and within field2 with OR operator.
...
: field1:(+solr +is +the +best +fts +ever) AND field2:(let us all contribute
: to open source for a better world)

First off -- be careful about your wording.  you are calling these 
phrases but in these examples, what you are really doing is searching 
for a set of terms.  there's no such thing as an OR phrase search -- 
when searching for phrases the entire phrase is mandatory, your only 
option is if you want to include any slop in how far apart the 
individual terms may be.

having said that: if what you want to do is search all of a set of 
terms in field1, and any of a set of terms in field2, you can use 
localparams and the _query_ hook in the LucneeQParser to split this up 
into multiple params where you specify a diffenret default op...

q=_query_:{!q.op=AND df=field1 v=$f1} _query_:{!q.op=OR df=field2 v=$f2}
f1=solr is the best fts ever
f2=let us all contribute to open source for a better world

: what i already tried is to split phrase1 by whitespaces, changing the
: default search operator to OR in the schema and add + signs before each
: word:
...
: this query is not good.. because when I am splitting phrase1 it is not how
: the index time tokenizer splits it... so I am not getting the results I

Second: this is't how the Lucene QueryParser works: whitepsace is a 
metacharacter for the queryparser, it splits on (unescaped) whitespace to 
determine individual clauses (which are then used to build boolean 
queries) before it ever consults the analyzer for the specified field (it 
actually doesn't even know which field it should use until it evaluates 
the whitespace it's parsing)

Reading between the lines, i *think* what you are saying is that you wnat 
to search for an exact phrase on field1, but any of hte words in field2, 
which is as simple as...

field1:solr is the best fts ever AND field2:(let us all contribute to open 
source for a better world)

...of course, if it's easier for your client to specify those as distinct 
params, you can still use the _query_ hook and local params, along with 
the field QParser..

q=_query_:{!field f=field1 v=$f1} _query_:{!q.op=OR df=field2 v=$f2}
f1=solr is the best fts ever
f2=let us all contribute to open source for a better world

...Lots of options.

https://wiki.apache.org/solr/SolrQuerySyntax



-Hoss


Re: math with date and modulo

2011-09-14 Thread Chris Hostetter

: I try to get a diff of today and an dateField. from this diff, i want do a
: modulo from another field with values of 1,3,6,12
...
: (DIFF(Month of Today - Month of Search) MOD interval) = 0

a) it looks like modulus was never implemneted as a function ... probably 
overlooked before it has not java.lang.Math.* static? ... please file a 
bug, it should be fairly trivial to add.

b) even with a mod(a,b) function, i'm not sure that you could do whta it 
seems like you want to do -- the ms() function will easily let you conpute 
the number of milliseconds between two date field, but there's no fuction 
that will give you the numeric value for the month of year (or day of 
month, or hour of day, etc...) for a date field ... even if there was, i 
don't think your calculation would work when the current month is before 
the month indexed in the date field.


If i understand your goal, you are probably better off indexing the month 
as it's own field (either numericly or just as a simple string) and then 
computing the list of matches you care about in the client (ie: 
fq=month:(feb, may, aug, nov) )

-Hoss


Re: RegexTransformer - need help with regex value

2011-09-14 Thread Pulkit Singhal
Thanks a bunch, got it working with a reluctant qualifier and the use
of quot; as the escaped representation of double qoutes within the
regex value so that the config file doesn't crash  burn:

field column=imageUrl
   regex=.*?img src=quot;(.*?)quot;.*
   sourceColName=description
   /

Cheers,
- Pulkit

On Wed, Sep 14, 2011 at 2:24 PM, Pulkit Singhal pulkitsing...@gmail.com wrote:
 Hello,

 Feel free to point me to alternate sources of information if you deem
 this question unworthy of the Solr list :)

 But until then please hear me out!

 When my config is something like:
            field column=imageUrl
                   regex=.*img src=.(.*)\.gif..alt=.*
                   sourceColName=description
                   /
 I don't get any data.

 But when my config is like:
            field column=imageUrl
                   regex=.*img src=.(.*)..alt=.*
                   sourceColName=description
                   /
 I get the following data as the value for imageUrl:
 http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_.gif;
 width=64

 As the result shows, this is a string that should be able to match
 even on the 1st regex=.*img src=.(.*)\.gif..alt=.* and produce a
 result like:
 http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_
 But it doesn't!
 Can anyone tell me why that would be the case?
 Is it something about the way RegexTransformer is wired or is it just
 my regex value that isn't right?



Norms - scoring issue

2011-09-14 Thread Adolfo Castro Menna
Hi All,

I hope someone could shed some light on the issue I'm facing with solr
3.1.0. It looks like it's computing diferrent fieldNorm values despite my
configuration that aims to ignore it.

   field name=item_name type=textgen indexed=true store=true
omitNorms=true omitTermFrequencyAndPositions=true /
   field name=item_description type=textTight indexed=true
store=true omitNorms=true omitTermFrequencyAndPositions=true /
   field name=item_tags type=text indexed=true stored=true
multiValued=true omitNorms=true omitTermFrequencyAndPositions=true /

I also have a custom class that extends DefaultSimilarity to override the
idf method.

Query:

str name=qitem_name:octopus seafood OR item_description:octopus seafood
OR item_tags:octopus seafood/str
str name=sortscore desc,item_ranking desc/str

The first 2 results are:
doc
float name=score0.5217492/float
str name=item_nameGrilled Octopus/str
arr name=item_tagsstrSeafood, tapas/str/arr
/doc
doc
float name=score0.49379835/float
   str name=item_nameoctopus marisco/str
   arr name=item_tagsstrAppetizer, Mexican, Seafood, food/str/arr
/doc

Does anyone know why they get a different score? I'm expecting them to have
the same scoring because both matched the two search terms.

I checked the debug information and it seems that the difference involves
the fieldNorm values.

1) Grilled Octopus
0.52174926 = (MATCH) product of:
  0.7826238 = (MATCH) sum of:
0.4472136 = (MATCH) weight(item_name:octopus in 69), product of:
  0.4472136 = queryWeight(item_name:octopus), product of:
1.0 = idf(docFreq=2, maxDocs=449)
0.4472136 = queryNorm
  1.0 = (MATCH) fieldWeight(item_name:octopus in 69), product of:
1.0 = tf(termFreq(item_name:octopus)=1)
1.0 = idf(docFreq=2, maxDocs=449)
1.0 = fieldNorm(field=item_name, doc=69)
0.1118034 = (MATCH) weight(text:seafood in 69), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.25 = (MATCH) fieldWeight(text:seafood in 69), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.25 = fieldNorm(field=text, doc=69)
0.1118034 = (MATCH) weight(text:seafood in 69), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.25 = (MATCH) fieldWeight(text:seafood in 69), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.25 = fieldNorm(field=text, doc=69)
0.1118034 = (MATCH) weight(text:seafood in 69), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.25 = (MATCH) fieldWeight(text:seafood in 69), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.25 = fieldNorm(field=text, doc=69)
  0.667 = coord(4/6)

2) octopus marisco

0.49379835 = (MATCH) product of:
  0.7406975 = (MATCH) sum of:
0.4472136 = (MATCH) weight(item_name:octopus in 81), product of:
  0.4472136 = queryWeight(item_name:octopus), product of:
1.0 = idf(docFreq=2, maxDocs=449)
0.4472136 = queryNorm
  1.0 = (MATCH) fieldWeight(item_name:octopus in 81), product of:
1.0 = tf(termFreq(item_name:octopus)=1)
1.0 = idf(docFreq=2, maxDocs=449)
1.0 = fieldNorm(field=item_name, doc=81)
0.09782797 = (MATCH) weight(text:seafood in 81), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.21875 = (MATCH) fieldWeight(text:seafood in 81), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.21875 = fieldNorm(field=text, doc=81)
0.09782797 = (MATCH) weight(text:seafood in 81), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.21875 = (MATCH) fieldWeight(text:seafood in 81), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.21875 = fieldNorm(field=text, doc=81)
0.09782797 = (MATCH) weight(text:seafood in 81), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.21875 = (MATCH) fieldWeight(text:seafood in 81), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.21875 = fieldNorm(field=text, doc=81)
  0.667 = coord(4/6)

Thanks in advance,
Adolfo.


[ANNOUNCE] Apache Solr 3.4.0 released

2011-09-14 Thread Michael McCandless
September 14 2011, Apache Solr™ 3.4.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 3.4.0.

Apache Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project. Its major features include
powerful full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
distributed search and index replication, and it powers the search and
navigation features of many of the world's largest internet sites.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at:

   http://www.apache.org/dyn/closer.cgi/lucene/solr (see note below).

If you are already using Apache Solr 3.1, 3.2 or 3.3, we strongly
recommend you upgrade to 3.4.0 because of the index corruption bug on OS
or computer crash or power loss (LUCENE-3418), now fixed in 3.4.0.

See the CHANGES.txt file included with the release for a full list of
details.

Solr 3.4.0 Release Highlights:

  * Bug fixes and improvements from Apache Lucene 3.4.0, including a
major bug (LUCENE-3418) whereby a Lucene index could
easily become corrupted if the OS or computer crashed or lost
power.

  * SolrJ client can now parse grouped and range facets results
(SOLR-2523).

  * A new XsltUpdateRequestHandler allows posting XML that's
transformed by a provided XSLT into a valid Solr document
(SOLR-2630).

  * Post-group faceting option (group.truncate) can now compute
facet counts for only the highest ranking documents per-group.
(SOLR-2665).

  * Add commitWithin update request parameter to all update handlers
that were previously missing it.  This tells Solr to commit the
change within the specified amount of time (SOLR-2540).

  * You can now specify NIOFSDirectory (SOLR-2670).

  * New parameter hl.phraseLimit speeds up FastVectorHighlighter
(LUCENE-3234).

  * The query cache and filter cache can now be disabled per request
See http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters
(SOLR-2429).

  * Improved memory usage, build time, and performance of
SynonymFilterFactory (LUCENE-3233).

  * Added omitPositions to the schema, so you can omit position
information while still indexing term frequencies (LUCENE-2048).

  * Various fixes for multi-threaded DataImportHandler.

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Happy searching,

Apache Lucene/Solr Developers


Re: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler

2011-09-14 Thread Chris Hostetter

: References: 41dfe0136ddf091e98d45dea9f0da1ab@localhost
:  cab_8yd9obtkvkdktqpfnuzmey-afbzajyvgahh58+mccgiq...@mail.gmail.com
: Message-ID: 1316011545.626.yahoomail...@web110411.mail.gq1.yahoo.com
: Subject: glassfish, solrconfig.xml  and SolrException: Error loading
:  DataImportHandler
: In-Reply-To:
: cab_8yd9obtkvkdktqpfnuzmey-afbzajyvgahh58+mccgiq...@mail.gmail.com

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.


-Hoss


Re: Performance troubles with solr

2011-09-14 Thread Chris Hostetter

: q=photo_id:* AND gender:true AND country:MALAWI AND online:false

photo_id:* does not mean what you probably think it means.  you most 
likely want photo_id:[* TO *] given your current schema, but i would 
recommend adding a new has_photo boolean field and using that instead.

thta alone should explain a big part of what those queries would be slow.

you didn't describe how your q param varies in your test queries (just 
your fq).  I'm assuming gender and online can vary, and that you 
sometimes don't use the photo_id clauses, and that the country clause 
can vary, but that these clauses are always all mandatory.

in which case i would suggest using fq for all of them individually, and 
leaving your q param as *:* (unless you sometimes sort on the actual 
solr score, in which case leave it as whatever part of hte queyr you 
actually want to contribute to hte score)

Lastly: I don't remember off the top of my head how int and tinit are 
defined in the example solrconfig files, but you should consider your 
usage of them carefully -- particularly with the precisionStep and which 
fields you do range queries on.



-Hoss


facet.method=fc

2011-09-14 Thread Patrick Sauts
Is the parameter facet.method=fc still needed ?

Thank you.

Patrick.



Re: facet.method=fc

2011-09-14 Thread Chris Hostetter

: Is the parameter facet.method=fc still needed ?

https://wiki.apache.org/solr/SimpleFacetParameters#facet.method

The default value is fc (except for BoolField) since it tends to use less 
memory and is faster when a field has many unique terms in the index. 


-Hoss


Re: Index not getting refreshed

2011-09-14 Thread Chris Hostetter

: I am using Solr 3.2 on a live website. i get live user's data of about 2000
: per day. I do an incremental index every 8 hours. but my search results
: always show the same result with same sorting order. when i check the same

Are you commiting?

Are you using replication?

Are you using a sort order that might not make it obvious that the new 
docs are actaully there? (ie: sort=timestamp asc)


-Hoss


Document frequency for all documents found by a query

2011-09-14 Thread Tomek Rej
Hi there

I'm using Solr to do some category mapping, and part of this process
consists of finding frequently occuring terms for each category id.
My index consists of a number of documents (mostly containing between 1 and
4 tokens), and a category id that this document belongs to.
Ideally I'd like to generate document frequencies for each term restricted
by category, but when I use the following http request it gives me the
frequencies over
the whole index (ignoring the category ids).
http://localhost:8983/solr/select?qt=tvrhq=category_id:9fl=xtv.all=truerows=1000

Is it possible to make Solr return document frequency over just the
documents returned from the query? If not what is the proper way to do this?

Thanks,
Tomek Rej


Re: Document frequency for all documents found by a query

2011-09-14 Thread Tomek Rej
Nevermind I just discovered faceting which does exactly what I want.
Sorry about that.

On Thu, Sep 15, 2011 at 11:31 AM, Tomek Rej tomek@roamz.com wrote:

 Hi there

 I'm using Solr to do some category mapping, and part of this process
 consists of finding frequently occuring terms for each category id.
 My index consists of a number of documents (mostly containing between 1 and
 4 tokens), and a category id that this document belongs to.
 Ideally I'd like to generate document frequencies for each term restricted
 by category, but when I use the following http request it gives me the
 frequencies over
 the whole index (ignoring the category ids).

 http://localhost:8983/solr/select?qt=tvrhq=category_id:9fl=xtv.all=truerows=1000

 Is it possible to make Solr return document frequency over just the
 documents returned from the query? If not what is the proper way to do this?

 Thanks,
 Tomek Rej



Re: any docs on using the GeoHashField?

2011-09-14 Thread Peter Wolanin
When I retrieve the value the lat/lon pair that comes out is not
exactly the same as what I indexed, which made be think it was
actually stored as the hash and then transformed back?

Anyhow - I'm trying to understand the actual use case for the field as
it exists - essentially you are saying I could query with a geohash
and use data in this field type to do a distance-based filter from the
lat,lon point corresponding to the geohash?

-Peter

On Thu, Sep 8, 2011 at 5:34 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : I would think I could index a lat,lon pair into a GeoHashField (that
 : works) and then retrieve the field value to see the computed geohash.
        ...
 : What am I missing - how can I retrieve the hash?

 I don't think it's designed to work that way.

 GeoHashField provides GeoHash based search support for lat/lon values
 through it's internal (indexed) representaiton -- much like TrieLongField
 provides efficient range queries using trie encoding -- but the stored
 value is still the lat/lon pair (just as a TrieLongField is still the long
 value)

 If you want to store/retrive a raq GeoHash string, i think you have to
 compute it yourself (or put the logic in an UpdateProcessor).

 org.apache.lucene.spatial.geohash.GeoHashUtils should take care of all the
 heavy lifting for you.

 -Hoss




-- 
Peter M. Wolanin, Ph.D.      : Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com : 781-313-8322

Get a free, hosted Drupal 7 site: http://www.drupalgardens.com;


Complex Fields, Indexing Storing

2011-09-14 Thread Mike Krieger
 Hi all,  

I have a quick question about how complex fields (subFields etc) interact with 
indexing and storage.

Let's say I have this (partial) schema:

fieldType name=location class=solr.LatLonType 
subFieldSuffix=_coordinate/
…
field name=pnt type=location indexed=true stored=true required=true 
/
dynamicField name=*_coordinate type=tdouble indexed=true stored=false/


(slightly modified from the example/ project).

If my goals were to:

1) Always query the location by lat,lng (never the subfields directly)
2) Never need to pull out the 'pnt' value from results (I always just need the 
document's ID back, not its contents)
3) Not have any unnecessary indexes

Is there anything that I should change about the set-up above? Will the schema 
as it stands index both the two dynamic fields that get created, as well as the 
'pnt' field above them (in effect creating 3 indexes)? Or is the 
indexed='true' on the 'pnt' field just needed so it properly connects to the 
two indexes created for the dynamic fields?

Thanks in advance,
Mike


Re: Index not getting refreshed

2011-09-14 Thread Pawan Darira
I am commiting but not doing replication now. Mine sort order also includes
last login timestamp. the new profiles are being reflected in my SOLR admin
 db. but its not listed on my website.

On Thu, Sep 15, 2011 at 4:25 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I am using Solr 3.2 on a live website. i get live user's data of about
 2000
 : per day. I do an incremental index every 8 hours. but my search results
 : always show the same result with same sorting order. when i check the
 same

 Are you commiting?

 Are you using replication?

 Are you using a sort order that might not make it obvious that the new
 docs are actaully there? (ie: sort=timestamp asc)


 -Hoss



Re: Index not getting refreshed

2011-09-14 Thread Pawan Darira
I have written simple java code to index my data. i am creating xml
documents  adding in the index. sorry, but due to company's policy could
not share the configuration details here.

On Wed, Sep 14, 2011 at 10:42 PM, Rahul Warawdekar 
rahul.warawde...@gmail.com wrote:

 Hi Pawan,

 Can you please share more details on the indexing mechanism ? (DIH,  SolrJ
 or any other)
 Please let us know the configuration details.


 On Wed, Sep 14, 2011 at 12:48 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:

  Hi
 
  I am using Solr 3.2 on a live website. i get live user's data of about
 2000
  per day. I do an incremental index every 8 hours. but my search results
  always show the same result with same sorting order. when i check the
 same
  search from corresponding db, it gives me different results always (as
 new
  data regularly gets added)
 
  please suggest what might be the issue. is there any cache related
 problem
  at SOLR level
 
  thanks
  pawan
 



 --
 Thanks and Regards
 Rahul A. Warawdekar



Re: indexing data from rich documents - Tika with solr3.1

2011-09-14 Thread scorpking
Hi Erick Erickson, 
Now, we have many files format(doc, ppt, pdf, ...), File's purpose serve to
search details content of education in that files. Because i am new solr, so
maybe i understand not enough depth about Apache Tika. At the moment i can't
index pdf files from http, with one file is ok. Thank for your attention. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3337963.html
Sent from the Solr - User mailing list archive at Nabble.com.