Re: Commit Issue in Solr 3.4

2014-02-08 Thread samarth s
Yes it is amazon ec2 indeed.

To expqnd on that,
This solr deployment was working fine, handling the same load, on a 34 GB
instance on ebs storage for quite some time. To reduce the time taken by a
commit, I shifted this to a 30 GB SSD instance. It performed better in
writes and commits for sure. But, since the last week I started facing this
problem of infinite back to back commits. Not being able to resolve this, I
have finally switched back to a 34 GB machine with ebs storage, and now the
commits are working fine, though slow.

Any thoughts?
On 6 Feb 2014 23:00, Shawn Heisey s...@elyograg.org wrote:

 On 2/6/2014 9:56 AM, samarth s wrote:
  Size of index = 260 GB
  Total Docs = 100mn
  Usual writing speed = 50K per hour
  autoCommit-maxDocs = 400,000
  autoCommit-maxTime = 1500,000 (25 mins)
  merge factor = 10
 
  M/c memory = 30 GB, Xmx = 20 GB
  Server - Jetty
  OS - Cent OS 6

 With 30GB of RAM (is it Amazon EC2, by chance?) and a 20GB heap, you
 have about 10GB of RAM left for caching your Solr index.  If that server
 has all 260GB of index, I am really surprised that you have only been
 having problems for a short time.  I would have expected problems from
 day one.  Even if it only has half or one quarter of the index, there is
 still a major discrepancy in RAM vs. index size.

 You either need more memory or you need to reduce the size of your
 index.  The size of the indexed portion generally has more of an impact
 on performance than the size of the stored portion, but they do both
 have an impact, especially on indexing and committing.  With regular
 disks, it's best to have at least 50% of your index size available to
 the OS disk cache, but 100% is better.

 http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

 If you are already using SSD, you might think there can't be
 memory-related performance problems ... but you still need a pretty
 significant chunk of disk cache.

 https://wiki.apache.org/solr/SolrPerformanceProblems#SSD

 Thanks,
 Shawn




Re: Index a new record in MySQL

2014-02-08 Thread tamanjit.bin...@yahoo.co.in
But delta indexing is supposed to do that. Please share details of your
datatimport-config.xml and version of Solr for further help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-a-new-record-in-MySQL-tp4116164p4116184.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index a new record in MySQL

2014-02-08 Thread PeriS
Hi,

Version of SOLR is 4.6 and here is my db-import-config.xml

dataConfig
dataSource name=dbDS type=JdbcDataSource driver=com.mysql.jdbc.Driver
url=[fill me]
user=“fill me password=“fill me/

document name=“example name

entity name=“table name pk=id 
transformer=com.example.sometransformer
query=SELECT * from [table]
deltaImportQuery=SELECT * from [table] where 
[pk]='${dataimporter.delta.pk}'
deltaQuery=SELECT [pk] from [table] where last_modified gt; 
'${dataimporter.last_index_time}'
field column=“col name=“col name/
field column=last_modified name=lastModified/
/entity

/document
/dataConfig





On Feb 8, 2014, at 7:29 AM, tamanjit.bin...@yahoo.co.in wrote:

 But delta indexing is supposed to do that. Please share details of your
 datatimport-config.xml and version of Solr for further help.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Index-a-new-record-in-MySQL-tp4116164p4116184.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




Re: Index a new record in MySQL

2014-02-08 Thread tamanjit.bin...@yahoo.co.in
The config seems ok. Have you checked the date in the properties file before
a delta-import? MySql does give issues with date format. Also, check your
webcontainer/solr logs; whether they give any exception when you call a
delta-import.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-a-new-record-in-MySQL-tp4116164p4116193.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Composite Unique key from existing fields in schema

2014-02-08 Thread tamanjit.bin...@yahoo.co.in
You could use a combination of all your composite key columns and put them in
a field in solr which can then be used as the unique key. As in if you have
two columns c1 and c2, you could have a field in solr which have the value
as c1_c2 or something on those lines.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Composite-Unique-key-from-existing-fields-in-schema-tp4116036p4116195.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Composite Unique key from existing fields in schema

2014-02-08 Thread tamanjit.bin...@yahoo.co.in
Also take a look at 
http://wiki.apache.org/solr/UniqueKey#Use_cases_which_require_a_unique_key_generated_from_data_in_the_document
http://wiki.apache.org/solr/UniqueKey#Use_cases_which_require_a_unique_key_generated_from_data_in_the_document
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Composite-Unique-key-from-existing-fields-in-schema-tp4116036p4116198.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud: problems with delete

2014-02-08 Thread Erick Erickson
What version of Solr?

If it works with commit=true, then it's working just fine.
Index changes aren't visible for searchers until a commit occurs.

How are you determining whether the doc is gone or not?
A Solr search (which depends on commits) or examining the
index from a low level? If the latter, then docs are just marked
as deleted, they aren't physically removed until segments
are merged.

Best
Erick

On Fri, Feb 7, 2014 at 4:25 AM, ku3ia dem...@gmail.com wrote:
 autocommit and /update/?commit=true works fine. I tell about, for example, I
 send 807632 docs to index to my 3 shard cluster - everything is fine, but
 when I'm trying to remove them, using POST request with small number of ids,
 lets say 100 per request - some docs are still on index, but seems must not.
 When I send a POST request with ids number like 1K or 2K all docs are
 deleted from index.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-problems-with-delete-tp4116026p4116038.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Mark Miller
If that is the case we really have to dig in. Given the error, the first thing 
I would assume is that you have an old solrj jar or something before 4.6.1 
involved with a 4.6.1 solrj jar or install.

- Mark  

http://about.me/markrmiller



On Feb 7, 2014, 7:15:24 PM, Mark Miller markrmil...@gmail.com wrote: Hey, 
yeah, blew it on this one. Someone just reported it the other day - the way 
that a bug was fixed was not back and forward compatible. The first 
implementation was wrong.  

You have to update the other nodes to 4.6.1 as well.  

I’m going to look at some scripting test that can help check for this type of 
thing.

- Mark  

http://about.me/markrmiller



On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com wrote: I 
have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ
4.6.1 and indexing ceased (indexer returned No live servers for shard but
the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
fine for the query side, just not adding documents.



21:35:21.508 [qtp1418442930-22296231] ERROR
o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)


Re: Tf-Idf for a specific query

2014-02-08 Thread Erick Erickson
David:

If you're, say, faceting on fields with lots of unique values, this
will be quite expensive.
No idea whether you can tolerate slower queries or not, just sayin'

Erick

On Fri, Feb 7, 2014 at 5:35 PM, David Miller davthehac...@gmail.com wrote:
 Thanks Mikhai,

 It seems that, this was what I was looking for. Being new to this, I wasn't
 aware of such a use of facets.

 Now I can probably combine the term vectors and facets to fit my scenario.

 Regards,
 Dave


 On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev mkhlud...@griddynamics.com
 wrote:

 David,

 I can imagine that DF for resultset is facets!


 On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com
 wrote:

  Hi Mikhail,
 
  The DF seems to be based on the entire document set. What I require is
  based on a the results of a single query.
 
  Suppose my Solr query returns a set of 50K documents from a superset of
  10Million documents, I require to calculate the DF just based on the 50K
  documents. But currently it seems to be calculated on the entire doc set.
 
  So, is there any way to get the DF or IDF just on basis of the docs
  returned by the query?
 
  Regards,
  Dave
 
 
 
 
 
 
 
  On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev 
  mkhlud...@griddynamics.com
   wrote:
 
   Hello Dave
   you can get DF from http://wiki.apache.org/solr/TermsComponent (invert
  it
   yourself)
   then, for certain term you can get number of occurrences per document
 by
   http://wiki.apache.org/solr/FunctionQuery#tf
  
  
  
   On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com
   wrote:
  
Hi Guys..
   
I require to obtain Tf-idf score from Solr for a certain set of
   documents.
But the catch is that, I needs the IDF (or DF) to be calculated on
 the
documents returned by the specific query and not the entire corpus.
   
Please provide me some hint on whether Solr has this feature or if I
  can
use the Lucene Api directly to achieve this.
   
   
Thanks in advance,
Dave
   
  
  
  
   --
   Sincerely yours
   Mikhail Khludnev
   Principal Engineer,
   Grid Dynamics
  
   http://www.griddynamics.com
mkhlud...@griddynamics.com
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Brett Hoerner
Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I
verified 4.6.1 is definitely winning and included alone when it breaks.


On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller markrmil...@gmail.com wrote:

 If that is the case we really have to dig in. Given the error, the first
 thing I would assume is that you have an old solrj jar or something before
 4.6.1 involved with a 4.6.1 solrj jar or install.

 - Mark

 http://about.me/markrmiller



 On Feb 7, 2014, 7:15:24 PM, Mark Miller markrmil...@gmail.com wrote:
 Hey, yeah, blew it on this one. Someone just reported it the other day -
 the way that a bug was fixed was not back and forward compatible. The first
 implementation was wrong.

 You have to update the other nodes to 4.6.1 as well.

 I’m going to look at some scripting test that can help check for this type
 of thing.

 - Mark

 http://about.me/markrmiller



 On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com wrote:
 I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ
 4.6.1 and indexing ceased (indexer returned No live servers for shard but
 the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
 fine for the query side, just not adding documents.



 21:35:21.508 [qtp1418442930-22296231] ERROR
 o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
 Unknown type 19
 at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
 at

 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
 at

 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
 at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
 at

 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
 at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
 at
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
 at

 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
 at

 org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
 at
 org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
 at

 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
 at

 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at

 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at

 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at

 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at

 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at

 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
 at

 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at

 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at

 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at

 

Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Mark Miller
If you look at the stack trace, the line numbers match 4.6.0 in the src, but 
not 4.6.1. That code couldn’t have been 4.6.1 it seems.

- Mark

http://about.me/markrmiller

On Feb 8, 2014, at 11:12 AM, Brett Hoerner br...@bretthoerner.com wrote:

 Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I
 verified 4.6.1 is definitely winning and included alone when it breaks.
 
 
 On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller markrmil...@gmail.com wrote:
 
 If that is the case we really have to dig in. Given the error, the first
 thing I would assume is that you have an old solrj jar or something before
 4.6.1 involved with a 4.6.1 solrj jar or install.
 
 - Mark
 
 http://about.me/markrmiller
 
 
 
 On Feb 7, 2014, 7:15:24 PM, Mark Miller markrmil...@gmail.com wrote:
 Hey, yeah, blew it on this one. Someone just reported it the other day -
 the way that a bug was fixed was not back and forward compatible. The first
 implementation was wrong.
 
 You have to update the other nodes to 4.6.1 as well.
 
 I’m going to look at some scripting test that can help check for this type
 of thing.
 
 - Mark
 
 http://about.me/markrmiller
 
 
 
 On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com wrote:
 I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ
 4.6.1 and indexing ceased (indexer returned No live servers for shard but
 the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
 fine for the query side, just not adding documents.
 
 
 
 21:35:21.508 [qtp1418442930-22296231] ERROR
 o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
 Unknown type 19
 at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
 at
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
 at
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
 at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
 at
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
 at
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
 at
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
 at
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
 at
 
 org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
 at
 org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
 at
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
 at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
 at
 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953)
 at
 

Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Brett Hoerner
Oh, I was talking about my indexer. That stack is from my Solr servers,
very weird since it's *supposed* to be 4.6.1 . I'll dig in more, thanks.


On Sat, Feb 8, 2014 at 10:21 AM, Mark Miller markrmil...@gmail.com wrote:

 If you look at the stack trace, the line numbers match 4.6.0 in the src,
 but not 4.6.1. That code couldn’t have been 4.6.1 it seems.

 - Mark

 http://about.me/markrmiller

 On Feb 8, 2014, at 11:12 AM, Brett Hoerner br...@bretthoerner.com wrote:

  Hmmm, I'm assembling into an uberjar that forces uniqueness of classes. I
  verified 4.6.1 is definitely winning and included alone when it breaks.
 
 
  On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller markrmil...@gmail.com
 wrote:
 
  If that is the case we really have to dig in. Given the error, the first
  thing I would assume is that you have an old solrj jar or something
 before
  4.6.1 involved with a 4.6.1 solrj jar or install.
 
  - Mark
 
  http://about.me/markrmiller
 
 
 
  On Feb 7, 2014, 7:15:24 PM, Mark Miller markrmil...@gmail.com wrote:
  Hey, yeah, blew it on this one. Someone just reported it the other day -
  the way that a bug was fixed was not back and forward compatible. The
 first
  implementation was wrong.
 
  You have to update the other nodes to 4.6.1 as well.
 
  I’m going to look at some scripting test that can help check for this
 type
  of thing.
 
  - Mark
 
  http://about.me/markrmiller
 
 
 
  On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com
 wrote:
  I have Solr 4.6.1 on the server and just upgraded my indexer app to
 SolrJ
  4.6.1 and indexing ceased (indexer returned No live servers for shard
 but
  the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
  fine for the query side, just not adding documents.
 
 
 
  21:35:21.508 [qtp1418442930-22296231] ERROR
  o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
  Unknown type 19
  at
  org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
  at
 
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
  at
 
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
  at
  org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
  at
 
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
  at
  org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
  at
 
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
  at
 
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
  at
 
 
 org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
  at
  org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
  at
 
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
  at
 
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at
 
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
  at
 
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
  at
 
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
  at
 
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:368)
  at
 
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
  at
 
 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 

Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19

2014-02-08 Thread Brett Hoerner
Mark, you were correct. I realized I was still running a prerelease of
4.6.1 (by a handful of commits). Bounced them with proper 4.6.1 and we're
all good, sorry for the spam. :)


On Sat, Feb 8, 2014 at 10:29 AM, Brett Hoerner br...@bretthoerner.comwrote:

 Oh, I was talking about my indexer. That stack is from my Solr servers,
 very weird since it's *supposed* to be 4.6.1 . I'll dig in more, thanks.


 On Sat, Feb 8, 2014 at 10:21 AM, Mark Miller markrmil...@gmail.comwrote:

 If you look at the stack trace, the line numbers match 4.6.0 in the src,
 but not 4.6.1. That code couldn’t have been 4.6.1 it seems.

 - Mark

 http://about.me/markrmiller

 On Feb 8, 2014, at 11:12 AM, Brett Hoerner br...@bretthoerner.com
 wrote:

  Hmmm, I'm assembling into an uberjar that forces uniqueness of classes.
 I
  verified 4.6.1 is definitely winning and included alone when it breaks.
 
 
  On Sat, Feb 8, 2014 at 9:44 AM, Mark Miller markrmil...@gmail.com
 wrote:
 
  If that is the case we really have to dig in. Given the error, the
 first
  thing I would assume is that you have an old solrj jar or something
 before
  4.6.1 involved with a 4.6.1 solrj jar or install.
 
  - Mark
 
  http://about.me/markrmiller
 
 
 
  On Feb 7, 2014, 7:15:24 PM, Mark Miller markrmil...@gmail.com wrote:
  Hey, yeah, blew it on this one. Someone just reported it the other day
 -
  the way that a bug was fixed was not back and forward compatible. The
 first
  implementation was wrong.
 
  You have to update the other nodes to 4.6.1 as well.
 
  I’m going to look at some scripting test that can help check for this
 type
  of thing.
 
  - Mark
 
  http://about.me/markrmiller
 
 
 
  On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com
 wrote:
  I have Solr 4.6.1 on the server and just upgraded my indexer app to
 SolrJ
  4.6.1 and indexing ceased (indexer returned No live servers for
 shard but
  the real root from the Solr servers is below). Note that SolrJ 4.6.1 is
  fine for the query side, just not adding documents.
 
 
 
  21:35:21.508 [qtp1418442930-22296231] ERROR
  o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException:
  Unknown type 19
  at
  org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232)
  at
 
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
  at
 
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
  at
  org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223)
  at
 
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
  at
  org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
  at
 
 org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114)
  at
 
 
 org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
  at
 
 
 org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
  at
 
 org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
  at
 
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
  at
 
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at
 
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)
  at
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)
  at
 
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
  at
 
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
  at
 
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at 

Re: Commit Issue in Solr 3.4

2014-02-08 Thread Shawn Heisey
On 2/8/2014 1:40 AM, samarth s wrote:
 Yes it is amazon ec2 indeed.
 
 To expqnd on that,
 This solr deployment was working fine, handling the same load, on a 34 GB
 instance on ebs storage for quite some time. To reduce the time taken by a
 commit, I shifted this to a 30 GB SSD instance. It performed better in
 writes and commits for sure. But, since the last week I started facing this
 problem of infinite back to back commits. Not being able to resolve this, I
 have finally switched back to a 34 GB machine with ebs storage, and now the
 commits are working fine, though slow.

The extra 4GB of RAM is almost guaranteed to be the difference.  If your
index continues to grow, you'll probably be having problems very soon
even with 34GB of RAM.  If you could put it on a box with 128 to 256GB
of RAM, you'd likely see your performance increase dramatically.

Can you share your solrconfig.xml file?  I may be able to confirm a
couple of things I suspect, and depending on what's there, may be able
to offer some ideas to help a little bit.  It's best if you use a file
sharing site like dropbox - the list doesn't deal with attachments very
well.  Sometimes they work, but most of the time they don't.

I will reiterate my main point -- you really need a LOT more memory.
Another option is to shard your index across multiple servers.  This
doesn't actually reduce the TOTAL memory requirement, but it is
sometimes easier to get management to agree to buy more servers than it
is to get them to agree to buy really large servers.  It's a paradox
that doesn't make any sense to me, but I've seen it over and over.

Thanks,
Shawn



Re: Commit Issue in Solr 3.4

2014-02-08 Thread Shawn Heisey
On 2/8/2014 10:22 AM, Shawn Heisey wrote:
 Can you share your solrconfig.xml file?  I may be able to confirm a
 couple of things I suspect, and depending on what's there, may be able
 to offer some ideas to help a little bit.  It's best if you use a file
 sharing site like dropbox - the list doesn't deal with attachments very
 well.  Sometimes they work, but most of the time they don't.

One additional idea:  Unless you know through actual testing that you
really do need a 20GB heap, try reducing it.  You have 100 million
documents, so perhaps you really do need a heap that big.

In addition to the solrconfig.xml, I would also be interested in knowing
what memory/GC tuning options you've used to start your java instance,
and I'd like to see a sampling of typical and worst-case query
parameters or URL strings.  I'd need to see all parameters and know
which request handler you used, so I can cross-reference with the config.

Thanks,
Shawn



Re: Commit Issue in Solr 3.4

2014-02-08 Thread Roman Chyla
I would be curious what the cause is. Samarth says that it worked for over
a year /and supposedly docs were being added all the time/. Did the index
grew considerably in the last period? Perhaps he could attach visualvm
while it is in the 'black hole' state to see what is actually going on. I
don't know if the instance is used also for searching, but if its only
indexing, maybe just shorter commit intervals would alleviate the problem.
To add context, our indexer is configured with 16gb heap, on machine with
64gb ram, but busy one, so sometimes there is no cache to spare for os. The
index is 300gb (out of which 140gb stored values), and it is working just
'fine' - 30doc/s on average, but our docs are large /0.5mb on avg/ and
fetched from two databases, so the slowness is outside solr. I didnt see
big improvements with bigger heap, but I don't remember exact numbers. This
is solr4.

Roman
On 8 Feb 2014 12:23, Shawn Heisey s...@elyograg.org wrote:

 On 2/8/2014 1:40 AM, samarth s wrote:
  Yes it is amazon ec2 indeed.
 
  To expqnd on that,
  This solr deployment was working fine, handling the same load, on a 34 GB
  instance on ebs storage for quite some time. To reduce the time taken by
 a
  commit, I shifted this to a 30 GB SSD instance. It performed better in
  writes and commits for sure. But, since the last week I started facing
 this
  problem of infinite back to back commits. Not being able to resolve
 this, I
  have finally switched back to a 34 GB machine with ebs storage, and now
 the
  commits are working fine, though slow.

 The extra 4GB of RAM is almost guaranteed to be the difference.  If your
 index continues to grow, you'll probably be having problems very soon
 even with 34GB of RAM.  If you could put it on a box with 128 to 256GB
 of RAM, you'd likely see your performance increase dramatically.

 Can you share your solrconfig.xml file?  I may be able to confirm a
 couple of things I suspect, and depending on what's there, may be able
 to offer some ideas to help a little bit.  It's best if you use a file
 sharing site like dropbox - the list doesn't deal with attachments very
 well.  Sometimes they work, but most of the time they don't.

 I will reiterate my main point -- you really need a LOT more memory.
 Another option is to shard your index across multiple servers.  This
 doesn't actually reduce the TOTAL memory requirement, but it is
 sometimes easier to get management to agree to buy more servers than it
 is to get them to agree to buy really large servers.  It's a paradox
 that doesn't make any sense to me, but I've seen it over and over.

 Thanks,
 Shawn




Re: Commit Issue in Solr 3.4

2014-02-08 Thread Shawn Heisey
On 2/8/2014 11:02 AM, Roman Chyla wrote:
 I would be curious what the cause is. Samarth says that it worked for over
 a year /and supposedly docs were being added all the time/. Did the index
 grew considerably in the last period? Perhaps he could attach visualvm
 while it is in the 'black hole' state to see what is actually going on. I
 don't know if the instance is used also for searching, but if its only
 indexing, maybe just shorter commit intervals would alleviate the problem.
 To add context, our indexer is configured with 16gb heap, on machine with
 64gb ram, but busy one, so sometimes there is no cache to spare for os. The
 index is 300gb (out of which 140gb stored values), and it is working just
 'fine' - 30doc/s on average, but our docs are large /0.5mb on avg/ and
 fetched from two databases, so the slowness is outside solr. I didnt see
 big improvements with bigger heap, but I don't remember exact numbers. This
 is solr4.

For this discussion, refer to this image, or the Google Books link where
I originally found it:

https://dl.dropboxusercontent.com/u/97770508/performance-dropoff-graph.png

http://books.google.com/books?id=dUiNGYCiWg0Cpg=PA33#v=onepageqf=false

Computer systems have had a long history of performance curves like
this.  Everything goes really well, possibly for a really long time,
until you cross some threshold where a resource cannot keep up with the
demands being placed on it.  That threshold is usually something you
can't calculate in advance.  Once it is crossed, even by a tiny amount,
performance drops VERY quickly.

I do recommend that people closely analyze their GC characteristics, but
jconsole, jvisualvm, and other tools like that are actually not very
good at this task.  You can only get summary info -- how many GCs
occurred and total amount of time spent doing GC, often with a useless
granularity -- jconsole reports the time in minutes on a system that has
been running for any length of time.

I *was* having occasional super-long GC pauses (15 seconds or more), but
I did not know it, even though I had religiously looked at GC info in
jconsole and jstat.  I discovered the problem indirectly, and had to
find additional tools to quantify it.  After discovering it, I tuned my
garbage collection and have not had the problem since.

If you have detailed GC logs enabled, this is a good free tool for
offline analysis:

https://code.google.com/p/gclogviewer/

I have also had good results with this free tool, but it requires a
little more work to set up:

http://www.azulsystems.com/jHiccup

Azul Systems has an alternate Java implementation for Linux that
virtually eliminates GC pauses, but it isn't free.  I do not have any
information about how much it costs.  We found our own solution, but for
those who can throw money at the problem, I've heard good things about it.

Thanks,
Shawn



Re: Max Limit to Schema Fields - Solr 4.X

2014-02-08 Thread Mike L.
That was the original plan. 

However its important to preserve the originating field that loaded the value. 
The data is very fine and granular and each field stores a particular value. 
When searching the data against solr - it would be important to know what docs 
contain that particular data from that particular field. (fielda:value, 
fieldb:value ) where as searching field:value would hide what originating field 
loaded this value. 

Im going to try loading all 3000 fields in the schema and see how that goes. 
Only concern is doing boolean searches and whether or not Ill run into URL 
length issues but I guess Ill find out soon.

Thanks again!

Sent from my iPhone

 On Feb 6, 2014, at 1:02 PM, Erick Erickson erickerick...@gmail.com wrote:
 
 Sometimes you can spoof the many fields problem by using prefixes on the
 data. Rather than fielda, fieldb... Have field and index values like
 fielda_value, fieldb_value into a single field. Then do the right thing
 when searching. Watch tokenization though.
 
 Best
 Erick
 On Feb 5, 2014 4:59 AM, Mike L. javaone...@yahoo.com wrote:
 
 
 Thanks Shawn. This is good to know.
 
 
 Sent from my iPhone
 
 On Feb 5, 2014, at 12:53 AM, Shawn Heisey s...@elyograg.org wrote:
 
 On 2/4/2014 8:00 PM, Mike L. wrote:
 I'm just wondering here if there is any defined limit to how many
 fields can be created within a schema? I'm sure the configuration
 maintenance of a schema like this would be a nightmare, but would like to
 know if its at all possible in the first place before It may be attempted.
 
 There are no hard limits on the number of fields, whether they are
 dynamically defined or not. Several thousand fields should be no
 problem.  If you have enough system resources and you don't run into an
 unlikely bug, there's no reason it won't work.  As you've already been
 told, there are potential performance concerns.  Depending on the exact
 nature of your queries, you might need to increase maxBooleanClauses.
 
 The only hard limitation that Lucene really has (and by extension, Solr
 also has that limitation) is that a single index cannot have more than
 about two billion documents in it - the inherent limitation on a Java
 int type.  Solr can use indexes larger than this through sharding.
 
 See the very end of this page:
 https://lucene.apache.org/core/4_6_0/core/org/apache/lucene/codecs/lucene46/package-summary.html#Limitations
 
 Thanks,
 Shawn
 


Re: Commit Issue in Solr 3.4

2014-02-08 Thread Roman Chyla
Thanks for the links. I think it would be worth getting more detailed info.
Because it could be the performance threshold, or it could be st else /such
as updated java version or st else, loosely related to ram, eg what is held
in memory before the commit, what is cached, leaked custom query objects
with holding to some big object etc/. Btw if i study the graph, i see that
there *are* warning signs. That's the point of testing/measuring after all,
IMHO.

--roman
On 8 Feb 2014 13:51, Shawn Heisey s...@elyograg.org wrote:

 On 2/8/2014 11:02 AM, Roman Chyla wrote:
  I would be curious what the cause is. Samarth says that it worked for
 over
  a year /and supposedly docs were being added all the time/. Did the index
  grew considerably in the last period? Perhaps he could attach visualvm
  while it is in the 'black hole' state to see what is actually going on. I
  don't know if the instance is used also for searching, but if its only
  indexing, maybe just shorter commit intervals would alleviate the
 problem.
  To add context, our indexer is configured with 16gb heap, on machine with
  64gb ram, but busy one, so sometimes there is no cache to spare for os.
 The
  index is 300gb (out of which 140gb stored values), and it is working just
  'fine' - 30doc/s on average, but our docs are large /0.5mb on avg/ and
  fetched from two databases, so the slowness is outside solr. I didnt see
  big improvements with bigger heap, but I don't remember exact numbers.
 This
  is solr4.

 For this discussion, refer to this image, or the Google Books link where
 I originally found it:

 https://dl.dropboxusercontent.com/u/97770508/performance-dropoff-graph.png

 http://books.google.com/books?id=dUiNGYCiWg0Cpg=PA33#v=onepageqf=false

 Computer systems have had a long history of performance curves like
 this.  Everything goes really well, possibly for a really long time,
 until you cross some threshold where a resource cannot keep up with the
 demands being placed on it.  That threshold is usually something you
 can't calculate in advance.  Once it is crossed, even by a tiny amount,
 performance drops VERY quickly.

 I do recommend that people closely analyze their GC characteristics, but
 jconsole, jvisualvm, and other tools like that are actually not very
 good at this task.  You can only get summary info -- how many GCs
 occurred and total amount of time spent doing GC, often with a useless
 granularity -- jconsole reports the time in minutes on a system that has
 been running for any length of time.

 I *was* having occasional super-long GC pauses (15 seconds or more), but
 I did not know it, even though I had religiously looked at GC info in
 jconsole and jstat.  I discovered the problem indirectly, and had to
 find additional tools to quantify it.  After discovering it, I tuned my
 garbage collection and have not had the problem since.

 If you have detailed GC logs enabled, this is a good free tool for
 offline analysis:

 https://code.google.com/p/gclogviewer/

 I have also had good results with this free tool, but it requires a
 little more work to set up:

 http://www.azulsystems.com/jHiccup

 Azul Systems has an alternate Java implementation for Linux that
 virtually eliminates GC pauses, but it isn't free.  I do not have any
 information about how much it costs.  We found our own solution, but for
 those who can throw money at the problem, I've heard good things about it.

 Thanks,
 Shawn




A bit lost in the land of schemaless Solr

2014-02-08 Thread Benson Margulies
Say that I have 10 fieldTypes for 10 languages. Is there a way to associate
a naming convention from field names to field types so that I can avoid
bothering with all those dynamic fields?


(lack) of error for missing library?

2014-02-08 Thread Benson Margulies
  !-- an exact 'path' can be used instead of a 'dir' to specify a
   specific jar file.  This will cause a serious error to be logged
   if it can't be loaded.
--

is the comment, but when I put a completely missing path in there -- no
error. Should I file a JIRA?


Re: Max Limit to Schema Fields - Solr 4.X

2014-02-08 Thread Jack Krupansky
It's safe to say that you are on thin ice - try it and if it works well for 
your specific data and requirements, great. But if it eventually fails, you 
can't say we didn't warn you!


Generally, I would say that a few hundred fields is the recommended 
limit - not that more will definitely cause problems, but because you will 
be beyond common usage and increasingly sensitive to amount of data and 
Java/JVM performance capabilities.


-- Jack Krupansky

-Original Message- 
From: Mike L.

Sent: Saturday, February 8, 2014 2:12 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Max Limit to Schema Fields - Solr 4.X

That was the original plan.

However its important to preserve the originating field that loaded the 
value. The data is very fine and granular and each field stores a particular 
value. When searching the data against solr - it would be important to know 
what docs contain that particular data from that particular field. 
(fielda:value, fieldb:value ) where as searching field:value would hide what 
originating field loaded this value.


Im going to try loading all 3000 fields in the schema and see how that goes. 
Only concern is doing boolean searches and whether or not Ill run into URL 
length issues but I guess Ill find out soon.


Thanks again!

Sent from my iPhone

On Feb 6, 2014, at 1:02 PM, Erick Erickson erickerick...@gmail.com 
wrote:


Sometimes you can spoof the many fields problem by using prefixes on the
data. Rather than fielda, fieldb... Have field and index values like
fielda_value, fieldb_value into a single field. Then do the right thing
when searching. Watch tokenization though.

Best
Erick

On Feb 5, 2014 4:59 AM, Mike L. javaone...@yahoo.com wrote:


Thanks Shawn. This is good to know.


Sent from my iPhone


On Feb 5, 2014, at 12:53 AM, Shawn Heisey s...@elyograg.org wrote:

On 2/4/2014 8:00 PM, Mike L. wrote:
I'm just wondering here if there is any defined limit to how many

fields can be created within a schema? I'm sure the configuration
maintenance of a schema like this would be a nightmare, but would like to
know if its at all possible in the first place before It may be 
attempted.


There are no hard limits on the number of fields, whether they are
dynamically defined or not. Several thousand fields should be no
problem.  If you have enough system resources and you don't run into an
unlikely bug, there's no reason it won't work.  As you've already been
told, there are potential performance concerns.  Depending on the exact
nature of your queries, you might need to increase maxBooleanClauses.

The only hard limitation that Lucene really has (and by extension, Solr
also has that limitation) is that a single index cannot have more than
about two billion documents in it - the inherent limitation on a Java
int type.  Solr can use indexes larger than this through sharding.

See the very end of this page:

https://lucene.apache.org/core/4_6_0/core/org/apache/lucene/codecs/lucene46/package-summary.html#Limitations


Thanks,
Shawn






Re: A bit lost in the land of schemaless Solr

2014-02-08 Thread Jack Krupansky

File a Jira? (Really!)

Yeah, the absolute separation of field and field type can be a bit annoying 
in some of these dynamic cases. It might be nice to have a hybrid - like a 
dynamic field with type details as nested content. I mean it does seems like 
one of the common use cases of dynamic field - the pattern suffix specifies 
the specific type.


Or, maybe the ability to specify a dynamic field by directly referencing the 
type, like: type:text_fr.


Or, just completely merge field and type so you can have any degree of 
hybrid you want, including pure field, pure type, field plus analyzer, etc. 
More like type as merely inheriting attributes from another field/type.


-- Jack Krupansky

-Original Message- 
From: Benson Margulies

Sent: Saturday, February 8, 2014 2:37 PM
To: solr-user@lucene.apache.org
Subject: A bit lost in the land of schemaless Solr

Say that I have 10 fieldTypes for 10 languages. Is there a way to associate
a naming convention from field names to field types so that I can avoid
bothering with all those dynamic fields? 



Re: Max Limit to Schema Fields - Solr 4.X

2014-02-08 Thread Shawn Heisey
On 2/8/2014 12:12 PM, Mike L. wrote:
 Im going to try loading all 3000 fields in the schema and see how that goes. 
 Only concern is doing boolean searches and whether or not Ill run into URL 
 length issues but I guess Ill find out soon.

It will likely work without a problem.  As already mentioned, you may
need to increase maxBooleanClauses in solrconfig.xml beyond the default
of 1024.

The max URL size is configurable with any decent servlet container,
including the jetty that comes with the Solr example.  In the part of
the jetty config that adds the connector, this increases the max HTTP
header size to 32K, and the size for the entire HTTP buffer to 64K.
These may not be big enough with 3000 fields, but it gives you the
general idea:

Set name=requestHeaderSize32768/Set
Set name=requestBufferSize65536/Set

Another option is to use a POST request instead of a GET request with
the parameters in the posted body.  The default POST buffer size in
Jetty is 200K.  In newer versions of Solr, the limit is actually set by
Solr, not the servlet container, and defaults to 2MB.  I believe that if
you are using SolrJ, it uses POST requests by default.

Thanks,
Shawn



RE: Highlight results in Arabic are backword

2014-02-08 Thread Fatima Issawi
Thank you both for responding. 

Is there a way to specify to Solr  to add those attributes on the field when it 
returns results (e.g. Language is Arabic, English. Or direction is LTR or 
RTL.)?  

Right now I only have Arabic content indexed, but we plan to add English in the 
near future. I don't want to have to re-do everything later if there is a 
better way of designing this now.

Regards,
Fatima

 -Original Message-
 From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
 Sent: Friday, February 07, 2014 3:48 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Highlight results in Arabic are backword
 
 Arabic if complex. Basically, don't trust anything you see until you put that
 content on the screen with the surrounding tag marked with attribute
 dir='rtl' (e.g. p dir='rlt'arabic test/p).
 
 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at once.
 Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 
 On Thu, Feb 6, 2014 at 10:12 PM, Steve Rowe sar...@gmail.com wrote:
  Hi Fatima,
 
  I don’t think there’s an actual problem, it just looks like it because the
 program you’re using to look at the JSON makes a different choice for laying
 out the highlighting results than it does for the field values.
 
  In fact, all the bytes are the same, and in the same order for both the
 “author” field text and the highlighting text, though some space characters
 are ASCII space (U+0020) in one and non-breaking space (U+00A0) in the
 other.
 
  By the way, I see the same thing as you in my email client (OS X Mail.app). 
   I
 assume there is a rule shared by our programs about complex layout like this,
 where right-to-left text is mixed with left-to-right text, likely based on the
 proportion of each, that triggers a left-to-right word sequencing instead of
 the expected right-to-left word sequencing.
 
  Anyway, I pulled out the author field and highlighting texts into an HTML
 document and viewed it in my browser (Safari), and both are layed out the
 same (with the exception of the emphasis given the highlighted word):
 
  ——
  html
  body
  pauthor: د. فيشر السعر,/p
  phighlighting: { 1: { author: [ د. emفيشر/em السعر ] }
  }/p /body /html ——
 
  Steve
 
  On Feb 6, 2014, at 8:23 AM, Fatima Issawi issa...@qu.edu.qa wrote:
 
  Hello,
 
  I am getting highlight results in Arabic, but the order of the words are
 backwards. Querying on that field gives me the correct result, though. Is
 there are setting I’m missing?
 
  An extract from an example query from my Solr Console is below:
 
  {
   responseHeader: {
 status: 0,
 QTime: 1,
 params: {
   indent: true,
   q: author:\فيشر\,
   _: 1391692704242,
   hl.simple.pre: em,
   hl.simple.post: /em,
   hl.fl: author,
   wt: json,
   hl: true
 }
   },
   response: {
 numFound: 4,
 start: 0,
 docs: [
   {
 pagenumber: 1,
 id: 1,
 author: د. فيشر السعر,
 author_s: د. فيشر السعر,
 collector: فاطمة عيساوي,
   },
   highlighting: {
 1: {
   author: [
 د. emفيشر/em السعر
   ]
 


Re: Highlight results in Arabic are backword

2014-02-08 Thread Alexandre Rafalovitch
You will most probably put your English and Arabic content into
different fields. Mostly because you will want to apply different
field type definitions to your English and Arabic text (tokenizers,
etc).

Also, I would search around the web for articles on multilingual
approach to Solr, if you are doing some deliberate design now. There
are some deeper issues. Some good questions are covered here:
http://info.basistech.com/blog/bid/171842/Indexing-Strategies-for-Multilingual-Search-with-Solr-and-Rosette
(even if it is talking about the commercial tool). There is also a
series of 12 blog posts on dealing with Solr for CJK in the libraries.
Your issues will be different, but there will be overlap.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Sun, Feb 9, 2014 at 12:56 PM, Fatima Issawi issa...@qu.edu.qa wrote:
 Thank you both for responding.

 Is there a way to specify to Solr  to add those attributes on the field when 
 it returns results (e.g. Language is Arabic, English. Or direction is LTR or 
 RTL.)?

 Right now I only have Arabic content indexed, but we plan to add English in 
 the near future. I don't want to have to re-do everything later if there is a 
 better way of designing this now.

 Regards,
 Fatima

 -Original Message-
 From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
 Sent: Friday, February 07, 2014 3:48 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Highlight results in Arabic are backword

 Arabic if complex. Basically, don't trust anything you see until you put that
 content on the screen with the surrounding tag marked with attribute
 dir='rtl' (e.g. p dir='rlt'arabic test/p).

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at once.
 Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Thu, Feb 6, 2014 at 10:12 PM, Steve Rowe sar...@gmail.com wrote:
  Hi Fatima,
 
  I don’t think there’s an actual problem, it just looks like it because the
 program you’re using to look at the JSON makes a different choice for laying
 out the highlighting results than it does for the field values.
 
  In fact, all the bytes are the same, and in the same order for both the
 “author” field text and the highlighting text, though some space characters
 are ASCII space (U+0020) in one and non-breaking space (U+00A0) in the
 other.
 
  By the way, I see the same thing as you in my email client (OS X 
  Mail.app).  I
 assume there is a rule shared by our programs about complex layout like this,
 where right-to-left text is mixed with left-to-right text, likely based on 
 the
 proportion of each, that triggers a left-to-right word sequencing instead of
 the expected right-to-left word sequencing.
 
  Anyway, I pulled out the author field and highlighting texts into an HTML
 document and viewed it in my browser (Safari), and both are layed out the
 same (with the exception of the emphasis given the highlighted word):
 
  ——
  html
  body
  pauthor: د. فيشر السعر,/p
  phighlighting: { 1: { author: [ د. emفيشر/em السعر ] }
  }/p /body /html ——
 
  Steve
 
  On Feb 6, 2014, at 8:23 AM, Fatima Issawi issa...@qu.edu.qa wrote:
 
  Hello,
 
  I am getting highlight results in Arabic, but the order of the words are
 backwards. Querying on that field gives me the correct result, though. Is
 there are setting I’m missing?
 
  An extract from an example query from my Solr Console is below:
 
  {
   responseHeader: {
 status: 0,
 QTime: 1,
 params: {
   indent: true,
   q: author:\فيشر\,
   _: 1391692704242,
   hl.simple.pre: em,
   hl.simple.post: /em,
   hl.fl: author,
   wt: json,
   hl: true
 }
   },
   response: {
 numFound: 4,
 start: 0,
 docs: [
   {
 pagenumber: 1,
 id: 1,
 author: د. فيشر السعر,
 author_s: د. فيشر السعر,
 collector: فاطمة عيساوي,
   },
   highlighting: {
 1: {
   author: [
 د. emفيشر/em السعر
   ]
 


RE: Highlight results in Arabic are backword

2014-02-08 Thread Fatima Issawi
Thank you. I will look into that.

 -Original Message-
 From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
 Sent: Sunday, February 09, 2014 9:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Highlight results in Arabic are backword
 
 You will most probably put your English and Arabic content into different
 fields. Mostly because you will want to apply different field type definitions
 to your English and Arabic text (tokenizers, etc).
 
 Also, I would search around the web for articles on multilingual approach to
 Solr, if you are doing some deliberate design now. There are some deeper
 issues. Some good questions are covered here:
 http://info.basistech.com/blog/bid/171842/Indexing-Strategies-for-
 Multilingual-Search-with-Solr-and-Rosette
 (even if it is talking about the commercial tool). There is also a series of 
 12
 blog posts on dealing with Solr for CJK in the libraries.
 Your issues will be different, but there will be overlap.
 
 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at once.
 Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 
 On Sun, Feb 9, 2014 at 12:56 PM, Fatima Issawi issa...@qu.edu.qa wrote:
  Thank you both for responding.
 
  Is there a way to specify to Solr  to add those attributes on the field 
  when it
 returns results (e.g. Language is Arabic, English. Or direction is LTR or 
 RTL.)?
 
  Right now I only have Arabic content indexed, but we plan to add English in
 the near future. I don't want to have to re-do everything later if there is a
 better way of designing this now.
 
  Regards,
  Fatima
 
  -Original Message-
  From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
  Sent: Friday, February 07, 2014 3:48 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Highlight results in Arabic are backword
 
  Arabic if complex. Basically, don't trust anything you see until you
  put that content on the screen with the surrounding tag marked with
  attribute dir='rtl' (e.g. p dir='rlt'arabic test/p).
 
  Regards,
 Alex.
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
 once.
  Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
 
 
  On Thu, Feb 6, 2014 at 10:12 PM, Steve Rowe sar...@gmail.com
 wrote:
   Hi Fatima,
  
   I don’t think there’s an actual problem, it just looks like it
   because the
  program you’re using to look at the JSON makes a different choice for
  laying out the highlighting results than it does for the field values.
  
   In fact, all the bytes are the same, and in the same order for both
   the
  “author” field text and the highlighting text, though some space
  characters are ASCII space (U+0020) in one and non-breaking space
  (U+00A0) in the other.
  
   By the way, I see the same thing as you in my email client (OS X
   Mail.app).  I
  assume there is a rule shared by our programs about complex layout
  like this, where right-to-left text is mixed with left-to-right text,
  likely based on the proportion of each, that triggers a left-to-right
  word sequencing instead of the expected right-to-left word sequencing.
  
   Anyway, I pulled out the author field and highlighting texts into
   an HTML
  document and viewed it in my browser (Safari), and both are layed out
  the same (with the exception of the emphasis given the highlighted
 word):
  
   ——
   html
   body
   pauthor: د. فيشر السعر,/p
   phighlighting: { 1: { author: [ د. emفيشر/em السعر ]
   } }/p /body /html ——
  
   Steve
  
   On Feb 6, 2014, at 8:23 AM, Fatima Issawi issa...@qu.edu.qa wrote:
  
   Hello,
  
   I am getting highlight results in Arabic, but the order of the
   words are
  backwards. Querying on that field gives me the correct result,
  though. Is there are setting I’m missing?
  
   An extract from an example query from my Solr Console is below:
  
   {
responseHeader: {
  status: 0,
  QTime: 1,
  params: {
indent: true,
q: author:\فيشر\,
_: 1391692704242,
hl.simple.pre: em,
hl.simple.post: /em,
hl.fl: author,
wt: json,
hl: true
  }
},
response: {
  numFound: 4,
  start: 0,
  docs: [
{
  pagenumber: 1,
  id: 1,
  author: د. فيشر السعر,
  author_s: د. فيشر السعر,
  collector: فاطمة عيساوي,  },
highlighting: {
  1: {
author: [
  د. emفيشر/em السعر
]