Re: Need Help in migrating Solr version 1.4 to 4.3

2013-06-27 Thread Shawn Heisey
On 6/26/2013 11:25 PM, Sandeep Gupta wrote:
 To have singleton design pattern for SolrServer object creation,
 I found that there are so many ways described in
 http://en.wikipedia.org/wiki/Singleton_pattern
 So which is the best one, out of 5 examples mentioned in above url, for web
 application in general practice.
 
 I am sure lots of people (in this mailing list) will have practical
 experience
 as which type of singleton pattern need to be implement for creation of
 SolrServer object.

I will admit that when I used the word singleton I honestly hadn't
looked it up to see what it really meant.  If you do use the full
meaning of singleton, you can do this in any way you want.

Perhaps a better thing to say is that you only need one SolrServer
object for each base URL (host/port/core combination).  Things are a
little bit different when it comes to SolrCloud - you can use one
CloudSolrServer object for the entire cloud, even if there are many
collections and many servers.

In my own SolrJ code, I create two HttpSolrServer objects within each of
my homegrown Core objects.  One of them is for operations against that
specific Solr core, the other is for CoreAdmin operations.

Because the URL for CoreAdmin operations is common to multiple cores, I
create a static Map with those server objects so that my Core objects
can share the SolrServer object used for CoreAdmin when they are on the
same server machine.

For the query side, if you're in a situation where you have one access
point to your Solr installation (a load balancer in front of replicating
Solr servers) and you only have one index, then you could create a
single static SolrServer object for your entire application.

Thanks,
Shawn



Is there a way to speed up my import

2013-06-27 Thread Mysurf Mail
I have a relational database model
This is the basics of my data-config.xml

entity name=MyMainEntity pk=pID query=select ... from [dbo].[TableA]
inner join TableB on ...
 entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2]
where ResourceId = '${MyMainEntity.pId}'/entity
entity name=Entity1 pk=Id2 query=SELECT [Text] Tag
from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity
entity name=LibraryItem pk=ResourceId
query=select SKU
 FROM [TableB]
INNER JOIN ...
ON ...
 INNER JOIN ...
ON ...
WHERE ... AND ...'
 /entity
/entity

Now, this takes a lot of time.
1 rows in the first query and then each other inner entities are
fetched later (around 10 rows each).

If I use a db profiler I see a the three inner entities query running over
and over (3 select sentences than again 3 select sentences over and over)
This is really not efficient.
And the import can run over 40 hrs ()
Now,
What are my options to run it faster .
1. Obviously there is an option to flat the tables to one big table - but
that will create a lot of other side effects.
I would really like to avoid that extra effort and run solr on my
production relational tables.
So far it works great out of the box and I am searching here if there
is a configuration tweak.
2. If I will flat the rows that - does the schema.xml need to be change
too? or the same fields that are multivalued will keep being multivalued.

Thanks.


Re: Is there a way to speed up my import

2013-06-27 Thread Gora Mohanty
On 27 June 2013 12:32, Mysurf Mail stammail...@gmail.com wrote:

 I have a relational database model
 This is the basics of my data-config.xml

 entity name=MyMainEntity pk=pID query=select ... from [dbo].[TableA]
 inner join TableB on ...
  entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2]
 where ResourceId = '${MyMainEntity.pId}'/entity
 entity name=Entity1 pk=Id2 query=SELECT [Text] Tag
 from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity
 entity name=LibraryItem pk=ResourceId
 query=select SKU
  FROM [TableB]
 INNER JOIN ...
 ON ...
  INNER JOIN ...
 ON ...
 WHERE ... AND ...'
  /entity
 /entity

 Now, this takes a lot of time.
 1 rows in the first query and then each other inner entities are
 fetched later (around 10 rows each).

 If I use a db profiler I see a the three inner entities query running over
 and over (3 select sentences than again 3 select sentences over and over)
 This is really not efficient.
 And the import can run over 40 hrs ()
 Now,
 What are my options to run it faster .
 1. Obviously there is an option to flat the tables to one big table - but
 that will create a lot of other side effects.
 I would really like to avoid that extra effort and run solr on my
 production relational tables.
 So far it works great out of the box and I am searching here if there
 is a configuration tweak.
 2. If I will flat the rows that - does the schema.xml need to be change
 too? or the same fields that are multivalued will keep being multivalued.

You have not shared your actual queries, so it is difficult
to tell, but my guess would be that it is the JOINs that
are the bottle-neck rather than the SELECTs. You should
start by:
1. Profile queries from the database back-end to see
which are taking the most time, and try to simplify
them.
2. Make sure that relevant database columns are indexed.
This can make a huge difference, though going overboard
 in indexing all columns might be counter-productive.
3. Use Solr DIH's CachedSqlEntityProcessor:
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
4. Measure the time that Solr indexing takes: From your
description, you seem to be guessing at it.

In general, you should not flatten the records in the
database as that is supposed to be relational data.

Regards,
Gora


Solr admin search with wildcard

2013-06-27 Thread Amit Sela
I'm looking to search (in the solr admin search screen) a certain field
for:

*youtube*

I know that leading wildcards takes a lot of resources but I'm not worried
with that

My only question is about the syntax, would this work:

field:*youtube* ?

Thanks,

I'm using Solr 3.6.2


Re: Need Help in migrating Solr version 1.4 to 4.3

2013-06-27 Thread Upayavira
I have done this - upgraded a 1.4 index to 3.x then on to 4.x. It
worked, but...

New field types have been introduced over time that facilitate new
functionality. To continue to use an upgraded index, you need to
continue using the old field types, and thus loose some of the coolness
of newer versions.

So, a re-index will set you in far better stead, if it is at all
possible.

Upayavira

On Tue, Jun 25, 2013, at 06:37 PM, Erick Erickson wrote:
 bq: I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes
 
 Solr/Lucene explicitly try to read _one_ major revision backwards.
 Solr 3.x should be able to read 1.4 indexes. Solr 4.x should be
 able to read Solr 3.x. No attempt is made to allow Solr 4.x to read
 Solr 1.4 indexes, so I wouldn't even try.
 
 Shalin's comment is best. If at all possible I'd just forget about
 reading the old index and re-index from scratch. But if you _do_
 try upgrading 1.4 - 3.x - 4.x, you probably want to optimize
 at each step. That'll (I think) rewrite all the segments in the
 current format.
 
 Good luck!
 Erick
 
 On Tue, Jun 25, 2013 at 12:59 AM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
  You must carefully go through the upgrade instructions starting from
  1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from
  3.1 to 4.0 should be given special attention.
 
  On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta gupta...@gmail.com wrote:
  Hello All,
 
  We are planning to migrate solr 1.4 to Solr 4.3 version.
  And I am seeking some help in this side.
 
  Considering Schema file change:
  By default there are lots of changes if I compare original Solr 1.4 schema
  file to Sol 4.3 schema file.
  And that is the reason we are not copying paste of schema file.
  In our Solr 1.4 schema implementation, we have some custom fields with type
  textgen and text
  So in migration of these custom fields to Solr 4.3,  should I use type of
  text_general as replacement of textgen and
  text_en as replacement of text?
  Please confirm the same.
 
  Please check the text_general definition in 4.3 against the textgen
  fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en
  and text.
 
 
  Considering Solrconfig change:
  As we didn't have lots of changes in 1.4 solrconfig file except the
  dataimport request handler.
  And therefore in migration side, we are simply modifying the Solr 4.3
  solrconfig file with his request handler.
 
  And you need to add the dataimporthandler jar into Solr's lib
  directory. DIH is not added automatically anymore.
 
 
  Considering the application development:
 
  We used all the queries as BOOLEAN type style (was not good)  I mean put
  all the parameter in query fields i.e
  *:* AND EntityName:  AND fileName:fieldValue AND .
 
  I think we should simplify our queries using other fields like df, qf 
 
 
  Probably. AND queries are best done by filter queries (fq).
 
  We also used to create Solr server object via CommonsHttpSolrServer() so I
  am planning to use now HttpSolrServer API
 
  Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in
  the javabin format so old clients using javabin won't be able to
  communicate with Solr until you upgrade both solr client and solr
  servers.
 
 
  Please let me know the suggestion for above points also what are the other
  factors I need to take care while considering the migration.
 
  There is no substitute for reading the upgrade sections in the changes.txt.
 
  I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes. You
  will most likely need to re-index your documents.
 
  You should also think about switching to SolrCloud to take advantage
  of its features.
 
  --
  Regards,
  Shalin Shekhar Mangar.


Filter queries taking a long time, even with cache disabled

2013-06-27 Thread Dotan Cohen
On a Solr 4.1 install I see that queries with use the fq parameter
take a long time (upwards of 120 seconds), both on the standard Lucene
query parser and also with edismax. I have added the {!cache=false}
localparam to the filter query, but this does not speed up the query.
Putting all the search terms in the main query returns results in
miliseconds.

Note that I am not using any wildcard queries, in each case I am
specifying the field to search and the terms to search on. Where
should I start to debug?

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Is there a way to build indexes using SOLRJ without SOLR instance?

2013-06-27 Thread Guido Medina
I'm not a hibernate fan either to be honest, but in the Java world if 
you have a good model oriented design I'm sure you prefer to map it to a 
DB using JPA2 for example, in our case, we use EclipseLink which for 
JPA2 I find it simpler and faster than Hibernate, now, I'm not sure of 
how many JPA2 implementations can be integrated to Solr/Lucene, several 
years ago I developed a project nicely using Hibernate + Hibernate 
Search with just Lucene (no Solr server)


In fact I have to apologize for advising Hibernate, but for some people 
it might be a good start, our company uses a polyglot design where I 
have Riak + EclipseLink (Objects mapped to PostgreSQL + interceptor to 
Riak), and for some objects Solr, I wish it was via annotations like in 
Hibernate search cause is pretty ugly to convert back and forth to json 
without any automation.


All this said, I too care about performance, but sometimes we want less 
code, design patterns and things to happen automatically, so Hibernate + 
Hibernate Search (If that's the only capable implementation) might not 
be a bad idea at all.


Guido.

On 27/06/13 03:14, Otis Gospodnetic wrote:

If hibernate search is like regular hibernate ORM I'm not sure I'd
trust it to pick the most optimal solutions...

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jun 26, 2013 4:44 PM, Guido Medina guido.med...@temetra.com wrote:


Never heard of embedded Solr server, isn't better to just use lucene alone
for that purpose? Using a helper like Hibernate? Since most applications
that require indexes will have a relational DB behind the scene, it would
not be a bad idea to use a ORM combined with Lucene annotations (aka
hibernate-search)

Guido.

On 26/06/13 20:30, Alexandre Rafalovitch wrote:


Yes, it is possible by running an embedded Solr inside SolrJ process.
The nice thing is that the index is portable, so you can then access
it from the standalone Solr server later.

I have an example here:
https://github.com/arafalov/**solr-indexing-book/tree/**
master/published/solrjhttps://github.com/arafalov/solr-indexing-book/tree/master/published/solrj
, which shows SolrJ running both as a client and with an embedded
container. Notice that you will probably need more jars than you
expect for the standalone Solr to work, including a number of servlet
jars.

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: 
http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Jun 26, 2013 at 2:59 PM, Learner bbar...@gmail.com wrote:


I currently have a SOLRJ program which I am using for indexing the data
in
SOLR. I am trying to figure out a way to build index without depending on
running instance of SOLR. I should be able to supply the solrconfig and
schema.xml to the indexing program which in turn create index files that
I
can use with any SOLR instance. Is it possible to implement this?



--
View this message in context: http://lucene.472066.n3.**
nabble.com/Is-there-a-way-to-**build-indexes-using-SOLRJ-**
without-SOLR-instance-**tp4073383.htmlhttp://lucene.472066.n3.nabble.com/Is-there-a-way-to-build-indexes-using-SOLRJ-without-SOLR-instance-tp4073383.html
Sent from the Solr - User mailing list archive at Nabble.com.





Data Import Handler and Extract Handler

2013-06-27 Thread Venter, Scott
Hi all,

I am new to SOLR. I have been working through the SOLR 4 Cookbook and my 
experiences so far have been great.

I have worked through the extraction of PDF data recipe, and the Data import 
recipe. I would now like to join these two things, i.e. I would like to do a 
data import from a Database table of users, and then somehow associate indexed 
PDF data with rows that were imported.

I have a conceptual link between rows in the database and pdf documents, but I 
don't know how to make a physical link between the two in SOLR. For example, I 
know that user x has pdf documents a, b and c. 

If I have imported my users into SOLR using Data Import Handler, how would I

1) import and associate the pdf documents using the extract mechanism, in such 
a way that there is a link between user x and the 3 pdf documents as described 
above?

2) is there a better way to join a table of users to a set of pdf documents?

Thanks in advance
Scott.
This e-mail is subject to a disclaimer, available at 
http://www.rmb.co.za/web/elements.nsf/online/disclaimer-communications.html


Re: Is there a way to speed up my import

2013-06-27 Thread Mysurf Mail
I just configured with the caching and it works mighty fast now.
Instead of unbelievable amount queries it queris only 4 times.
CPU usage has moved from the db to the solr computer but only for a very
short time.

Problem :
I dont see the multi value fields (Inner Entities) anymore
This is  my configuration

entity name=PackageVersion pk=PackageVersionId
 query=select PackageVersion.Id PackageVersionId,  from 
entity name=PackageTag pk=ResourceId
processor=CachedSqlEntityProcessor where=ResourceId =
'${PackageVersion.PackageId}'
 query=SELECT [Text] PackageTag from [dbo].[Tag]
/entity
entity name=PackageVersionTag pk=ResourceId
processor=CachedSqlEntityProcessor where=ResourceId =
PackageVersion.PackageVersionId
 query=SELECT [Text] PackageVersionTag from [dbo].[Tag]
/entity
entity name=LibraryItem pk=ResourceId
processor=CachedSqlEntityProcessor where=Asset.[PackageVersionId] =
PackageVersion.PackageVersionId
 query=select CatalogVendorPartNum SKU, LibraryItems.[Description]
SKUDescription
FROM ...
 INNER JOIN ...
ON Asset.Id = LibraryVendors.DesignProjectId
INNER JOIN ...
 ON LibraryVendors.LibraryVendorId = LibraryItems.LibraryVendorId
WHERE Asset.[AssetTypeId]=1
 /entity
/entity

Now, when I query
http://localhost:8983/solr/vaultCache/select?q=*indent=true
it returns only the main entity attriburtes.
Where are my inner entities attributes now?
Thanks a lot.







On Thu, Jun 27, 2013 at 10:15 AM, Gora Mohanty g...@mimirtech.com wrote:

 On 27 June 2013 12:32, Mysurf Mail stammail...@gmail.com wrote:
 
  I have a relational database model
  This is the basics of my data-config.xml
 
  entity name=MyMainEntity pk=pID query=select ... from
 [dbo].[TableA]
  inner join TableB on ...
   entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2]
  where ResourceId = '${MyMainEntity.pId}'/entity
  entity name=Entity1 pk=Id2 query=SELECT [Text] Tag
  from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity
  entity name=LibraryItem pk=ResourceId
  query=select SKU
   FROM [TableB]
  INNER JOIN ...
  ON ...
   INNER JOIN ...
  ON ...
  WHERE ... AND ...'
   /entity
  /entity
 
  Now, this takes a lot of time.
  1 rows in the first query and then each other inner entities are
  fetched later (around 10 rows each).
 
  If I use a db profiler I see a the three inner entities query running
 over
  and over (3 select sentences than again 3 select sentences over and over)
  This is really not efficient.
  And the import can run over 40 hrs ()
  Now,
  What are my options to run it faster .
  1. Obviously there is an option to flat the tables to one big table - but
  that will create a lot of other side effects.
  I would really like to avoid that extra effort and run solr on my
  production relational tables.
  So far it works great out of the box and I am searching here if there
  is a configuration tweak.
  2. If I will flat the rows that - does the schema.xml need to be change
  too? or the same fields that are multivalued will keep being multivalued.

 You have not shared your actual queries, so it is difficult
 to tell, but my guess would be that it is the JOINs that
 are the bottle-neck rather than the SELECTs. You should
 start by:
 1. Profile queries from the database back-end to see
 which are taking the most time, and try to simplify
 them.
 2. Make sure that relevant database columns are indexed.
 This can make a huge difference, though going overboard
  in indexing all columns might be counter-productive.
 3. Use Solr DIH's CachedSqlEntityProcessor:
 http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
 4. Measure the time that Solr indexing takes: From your
 description, you seem to be guessing at it.

 In general, you should not flatten the records in the
 database as that is supposed to be relational data.

 Regards,
 Gora



Re: Is there a way to build indexes using SOLRJ without SOLR instance?

2013-06-27 Thread Upayavira
If what you want to do is create an index that can later be used by
Solr, then create the index with Solr. Solr has constraints about how a
Lucene index is created that you would replicate and would create a huge
amount of work.

SolrJ does have the 'embedded mode' in which the Solr itself runs in the
same JVM as the client - i.e. no HTTP transport.  It could be a useful
way to do off-line index creation,

I've never used it though so can't vouch for it.

Upayavira

On Wed, Jun 26, 2013, at 09:43 PM, Guido Medina wrote:
 Never heard of embedded Solr server, isn't better to just use lucene 
 alone for that purpose? Using a helper like Hibernate? Since most 
 applications that require indexes will have a relational DB behind the 
 scene, it would not be a bad idea to use a ORM combined with Lucene 
 annotations (aka hibernate-search)
 
 Guido.
 
 On 26/06/13 20:30, Alexandre Rafalovitch wrote:
  Yes, it is possible by running an embedded Solr inside SolrJ process.
  The nice thing is that the index is portable, so you can then access
  it from the standalone Solr server later.
 
  I have an example here:
  https://github.com/arafalov/solr-indexing-book/tree/master/published/solrj
  , which shows SolrJ running both as a client and with an embedded
  container. Notice that you will probably need more jars than you
  expect for the standalone Solr to work, including a number of servlet
  jars.
 
  Regards,
 Alex.
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all
  at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
 
 
  On Wed, Jun 26, 2013 at 2:59 PM, Learner bbar...@gmail.com wrote:
  I currently have a SOLRJ program which I am using for indexing the data in
  SOLR. I am trying to figure out a way to build index without depending on
  running instance of SOLR. I should be able to supply the solrconfig and
  schema.xml to the indexing program which in turn create index files that I
  can use with any SOLR instance. Is it possible to implement this?
 
 
 
  --
  View this message in context: 
  http://lucene.472066.n3.nabble.com/Is-there-a-way-to-build-indexes-using-SOLRJ-without-SOLR-instance-tp4073383.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: shard failure, leader transition took 11s (seems high?)

2013-06-27 Thread Daniel Collins
On thinking about this, isn't it a potentially more serious problem,
especially in view of the NRT support which Solr now offers?

If a server crashes (hard), ZK detects this using the heartbeat, and would
remove the /live_node, which would trigger a leader election for this
shard.
But if we soft shut it down, it seems that again we have to wait for the
instance to physically die (and the live_node to disappear) before we get a
leadership election.  For all the time between Jetty shutting down and that
happening, we have no valid leader for that shard (but ZK and the rest of
the cloud think we do).

Now searches to that shard are distributed round-robin (using the standard
Solr load balancer within Solr cloud) so they will see the failed node, and
immediately retry to another replica (and presumably work).
However, updates keep going to the (now dead) leader, shouldn't that error
in SolrCmdDistributor (forwarding update to
http://xx4:10600/solr/collection1/ failed - retrying) trigger an
election?  Retrying to a node which isn't available works if it was a
transient issue, but is that the more common case?

Maybe we have a more specialized case than most, but we have very frequent
updates and want (near) real-time indexing, we are trying to minimize
latency between index and search. We currently soft-commit every 1s to do
that and we might get several hundred stories during that second, so
failing all updates for 11s in our case is a serious issue.  I know the
Cloud has returned an error code so we know the updates have failed, but at
our application level, there is nothing else we can do, surely? Solr has to
send to the leader, but the leader isn't available, so shouldn't the cloud
be handling that?



On 24 June 2013 14:58, Daniel Collins danwcoll...@gmail.com wrote:

 Thanks Mark.  Yes, I expected some finite time for the leader to take
 over, just hadn't realized/comprehended that Jetty was already shutdown by
 this point...  Yes, I suppose the container has to stop sending requests to
 the context before it can shut the context down, so that's the window where
 the individual container knows its going down, but nothing else does (yet).
  Will try to have a think about that shutdown/stop API, I suspect we'll
 need it for production (yes we can retry but we are using soft-commit to
 get a NRT as we can, so a 10s pause isn't really acceptable in our case).


 On 24 June 2013 14:46, Mark Miller markrmil...@gmail.com wrote:

 It will take a short bit of a time before a new leader takes over when a
 leader goes - that's expected - how long it takes will vary. Some things
 will do short little retries to kind of deal with this, but you are alerted
 those updates failed, so you have to deal with that as you would other
 update fails on the client side. SolrCloud favors consistency over write
 availability. That's the short part where you lose write availability.

 To get a 'clean' shutdown - eg you want to bring the machine down, it
 didn't get hit by lightening, we have to add some specific clean stop api
 you can call first - by the time jetty (or whatever container) tells Solr
 it's shutting down, it's too late to pull the node out gracefully.

 I've danced around it in the past, but have never gotten to making that
 clean shutdown/stop API.

 - Mark

 On Jun 24, 2013, at 8:25 AM, Daniel Collins danwcoll...@gmail.com
 wrote:

  Just had an odd scenario in our current Solr system (4.3.0 + SOLR-4829
  patch), 4 shards, 2 replicas (leader + 1 other) per shard spread across
 8
  machines.
 
  We sent all our updates into a single instance, and we shutdown a leader
  for maintenance, expecting it to failover to the other replica.  What I
 saw
  was that when the leader shard went down, the instance taking updates
  started seeing rejections almost instantly, yet the cluster state
 changes
  didn't occur for several seconds.  During that time, we had no valid
 leader
  for one of our shards, so we were losing updates and queries.
 
  (shard4 leader)
  07:10:33,124 - xx4 (shard 4 leader) starts coming down.
  07:10:35,885 - cluster state change is detected
  07:10:37,172 - nsrchnj4 publishes itself as down
  07:10:37,869 - second cluster state change detected
  07:10:40,202 - closing searcher
  07:10:43,447 - cluster state change (live_nodes)
 
  (instance taking updates)
  07:10:33,443 - starts seeing rejections from xx4
  07:10:35,937 - detects a cluster state change (red herring)
  07:10:37,899 - detects another cluster state change
  07:10:43,478 - detects a live_nodes change (as shard4 leader is really
 down
  now)
  07:10:44,586 - detects that shard4 has no leader anymore
 
  (x8) - new shard4 leader
 
  07:10:32,981 - last story FROMLEADER (xx4)
  07:10:35,980 - cluster state change detected (red herring)
  07:10:37,975 - another cluster state change detected
  07:10:43,868 - running election process(!)
  07:10:44,069 - nsrchnj8 becomes leader, tries to sync from nsrchnj4
 (which
  is 

Re: Filter queries taking a long time, even with cache disabled

2013-06-27 Thread Upayavira
can you give an example?

On Thu, Jun 27, 2013, at 09:08 AM, Dotan Cohen wrote:
 On a Solr 4.1 install I see that queries with use the fq parameter
 take a long time (upwards of 120 seconds), both on the standard Lucene
 query parser and also with edismax. I have added the {!cache=false}
 localparam to the filter query, but this does not speed up the query.
 Putting all the search terms in the main query returns results in
 miliseconds.
 
 Note that I am not using any wildcard queries, in each case I am
 specifying the field to search and the terms to search on. Where
 should I start to debug?
 
 --
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com


Re: Solr, Shards, multi cores and (reverse proxy)

2013-06-27 Thread medley

* I have created a new RequestHandler and added the list of the shards :

...
str
name=shardslocalhost:8780/apache-solr/leg0,localhost:8780/apache-solr/leg1,localhost:8780/apache-solr/leg2,localhost:8780/apache-solr/leg3,localhost:8780/apache-solr/leg4,localhost:8780/apache-solr/leg5/str
...


* In the url, I replaced shards=... by shards.qt

It is working well.

Thanks a lot of your help.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shards-multi-cores-and-reverse-proxy-tp4072094p4073543.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is Overlapping onDeckSearchers=2 really a problem?

2013-06-27 Thread Robert Krüger
Hi,

I have a desktop application where I am abusing solr as an embedded
database accessing it and I am quite happy with everything.
Performance is more than goog enough for my use case and Solr's query
capabilities match the requirements of my app quite well. However, I
have the well-known performance warnings (see subject) in the log
whenever I index a lot of documents, although I never experience any
performance problems (might be hidden, though). The properties of my
app are:

- I (soft-)commit after every indexed item because I need the changes
to be visible immediately
- The commits are serialized
- I do not have any warming queries configured

I have read the FAQ but don't see anthing that helps in my case. As I
said, I am happy with everything as it is but the warning makes me a
bit nervous (and maybe at some point my customers when their logs are
full of those warnings). What could I do to eliminate it? Can I
configure only one searcher to be used or anything like that?

Thanks for any hints,

Robert


Re: Need Help in migrating Solr version 1.4 to 4.3

2013-06-27 Thread Upayavira
As much as possible, use new configs. Take fieldType definitions from
your 4.x example dir, don't use the old ones. e.g. if you use the old
date field type, it won't be usable in various ways (e.g. in the MS()
function).

Upayavira

On Thu, Jun 27, 2013, at 11:00 AM, Sandeep Gupta wrote:
 Thanks again Shawn for your comments.
 
 I am little worried about the multi threading of web application which
 uses
 servlets.
 
 I also found one of your explanation (please confirm the same whether its
 your comment only) in
 http://lucene.472066.n3.nabble.com/Memory-problems-with-HttpSolrServer-td4060985.html
 for the question :
 http://stackoverflow.com/questions/11931179/httpsolrserver-instance-management
 
 As you said correctly that creation of SolrServer object depends on
 number
 of shards/solrcores and thereafter need to think for implementation which
 may use singleton pattern.
 
 In my web application side,  I have only one solrcore which is default
 one
 collection1 so I will create one SolrServer object for my application.
 Sure If we decide to go for Solr Cloud then also I will create one
 object.
 
 Thanks Upayavira, yes I will do the re-index. Anything you want to
 suggest
 as you did the same migration.
 
 Thanks
 Sandeep
 
 
 
 
 
 
 
 
 
 On Thu, Jun 27, 2013 at 1:33 PM, Upayavira u...@odoko.co.uk wrote:
 
  I have done this - upgraded a 1.4 index to 3.x then on to 4.x. It
  worked, but...
 
  New field types have been introduced over time that facilitate new
  functionality. To continue to use an upgraded index, you need to
  continue using the old field types, and thus loose some of the coolness
  of newer versions.
 
  So, a re-index will set you in far better stead, if it is at all
  possible.
 
  Upayavira
 
  On Tue, Jun 25, 2013, at 06:37 PM, Erick Erickson wrote:
   bq: I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes
  
   Solr/Lucene explicitly try to read _one_ major revision backwards.
   Solr 3.x should be able to read 1.4 indexes. Solr 4.x should be
   able to read Solr 3.x. No attempt is made to allow Solr 4.x to read
   Solr 1.4 indexes, so I wouldn't even try.
  
   Shalin's comment is best. If at all possible I'd just forget about
   reading the old index and re-index from scratch. But if you _do_
   try upgrading 1.4 - 3.x - 4.x, you probably want to optimize
   at each step. That'll (I think) rewrite all the segments in the
   current format.
  
   Good luck!
   Erick
  
   On Tue, Jun 25, 2013 at 12:59 AM, Shalin Shekhar Mangar
   shalinman...@gmail.com wrote:
You must carefully go through the upgrade instructions starting from
1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from
3.1 to 4.0 should be given special attention.
   
On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta gupta...@gmail.com
  wrote:
Hello All,
   
We are planning to migrate solr 1.4 to Solr 4.3 version.
And I am seeking some help in this side.
   
Considering Schema file change:
By default there are lots of changes if I compare original Solr 1.4
  schema
file to Sol 4.3 schema file.
And that is the reason we are not copying paste of schema file.
In our Solr 1.4 schema implementation, we have some custom fields
  with type
textgen and text
So in migration of these custom fields to Solr 4.3,  should I use
  type of
text_general as replacement of textgen and
text_en as replacement of text?
Please confirm the same.
   
Please check the text_general definition in 4.3 against the textgen
fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en
and text.
   
   
Considering Solrconfig change:
As we didn't have lots of changes in 1.4 solrconfig file except the
dataimport request handler.
And therefore in migration side, we are simply modifying the Solr 4.3
solrconfig file with his request handler.
   
And you need to add the dataimporthandler jar into Solr's lib
directory. DIH is not added automatically anymore.
   
   
Considering the application development:
   
We used all the queries as BOOLEAN type style (was not good)  I mean
  put
all the parameter in query fields i.e
*:* AND EntityName:  AND fileName:fieldValue AND .
   
I think we should simplify our queries using other fields like df, qf
  
   
   
Probably. AND queries are best done by filter queries (fq).
   
We also used to create Solr server object via CommonsHttpSolrServer()
  so I
am planning to use now HttpSolrServer API
   
Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in
the javabin format so old clients using javabin won't be able to
communicate with Solr until you upgrade both solr client and solr
servers.
   
   
Please let me know the suggestion for above points also what are the
  other
factors I need to take care while considering the migration.
   
There is no substitute for reading the upgrade sections 

Searching and Retrieving Information Protocol For Solr

2013-06-27 Thread Furkan KAMACI
There is a low level protocol that defines client–server protocol for
searching and retrieving information from remote computer databases called
as Z39.50. Due to Solr is a commonly used search engine (beside being a
NoSQL database) is there any protocol for (I don't mean a low level
protocol, z39.50 is just an example) Solr that it can integrate with other
clients or anything else?


Re: Is there a way to speed up my import

2013-06-27 Thread Gora Mohanty
On 27 June 2013 14:12, Mysurf Mail stammail...@gmail.com wrote:
 I just configured with the caching and it works mighty fast now.
 Instead of unbelievable amount queries it queris only 4 times.
 CPU usage has moved from the db to the solr computer but only for a very
 short time.

 Problem :
 I dont see the multi value fields (Inner Entities) anymore
 This is  my configuration
[...]

Please check the syntax of your where clause against
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
Your inner entities should have clauses like
where=ResourceId=PackageVersion.PackageId.
I am also not sure why you have the strange
square brackets.

Regards,
Gora


Re: Is Overlapping onDeckSearchers=2 really a problem?

2013-06-27 Thread Robert Krüger
Hi,

On Thu, Jun 27, 2013 at 12:23 PM, Robert Krüger krue...@lesspain.de wrote:
 Hi,

 I have a desktop application where I am abusing solr as an embedded
 database accessing it and I am quite happy with everything.
 Performance is more than goog enough for my use case and Solr's query
 capabilities match the requirements of my app quite well. However, I
 have the well-known performance warnings (see subject) in the log
 whenever I index a lot of documents, although I never experience any
 performance problems (might be hidden, though). The properties of my
 app are:

 - I (soft-)commit after every indexed item because I need the changes
 to be visible immediately
 - The commits are serialized
 - I do not have any warming queries configured

 I have read the FAQ but don't see anthing that helps in my case. As I
 said, I am happy with everything as it is but the warning makes me a
 bit nervous (and maybe at some point my customers when their logs are
 full of those warnings). What could I do to eliminate it? Can I
 configure only one searcher to be used or anything like that?

 Thanks for any hints,

 Robert

sometime forcing oneself to describe a problem is the first step to a
solution. I just realized that I also had an autocommit statement in
my config with the exact same amount of time the seemed to be between
the warnings.

I removed that, because I don't think I really need it, and now the
warnings are gone. So it seems it happened whenever my manual commits
overlapped with an autocommit, which, of course, was more likely when
many commits were issued in sequence.


displaying one result per domain

2013-06-27 Thread Wojciech Kapelinski
I'm looking for a neat solution to replace default multiple results from
single domain in SERP

somepage.com/contact.html
somepage.com/aboutus.html
otherpage.net/info.html
somepage.com/directions.html  etc

with only one result per each domain [main URL by default]

somepage.com
otherpage.net
completelydifferentpage.org

Tried grouping by Carrot2 but it's not exactly what I'm looking for.

Thanks in advance.


Re: Solr admin search with wildcard

2013-06-27 Thread Jack Krupansky

No, you cannot use wildcards within a quoted term.

Tell us a little more about what your strings look like. You might want to 
consider tokenizing or using ngrams to avoid the need for wildcards.


-- Jack Krupansky

-Original Message- 
From: Amit Sela

Sent: Thursday, June 27, 2013 3:33 AM
To: solr-user@lucene.apache.org
Subject: Solr admin search with wildcard

I'm looking to search (in the solr admin search screen) a certain field
for:

*youtube*

I know that leading wildcards takes a lot of resources but I'm not worried
with that

My only question is about the syntax, would this work:

field:*youtube* ?

Thanks,

I'm using Solr 3.6.2 



how to delete on column of a doc in solr

2013-06-27 Thread anurag.jain
In my solr schema there is one dynamic field.

   dynamicField name=jobs_*  type=floatindexed=true 
stored=true/
So I have one doc value,

docs: [
{
last_name: Jain,
state_name: rajasthan,
mobile_no: 234534564621,
id: 4,
jobs_6554: 6554,

},...]
Now I just want to delete one column, means jobs_6554 not the complete doc.
How it can possible in solr.

So after delete, docs will be.

docs: [
{
last_name: Jain,
state_name: rajasthan,
mobile_no: 234534564621,
id: 4
},...]



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-delete-on-column-of-a-doc-in-solr-tp4073587.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: displaying one result per domain

2013-06-27 Thread Erik Hatcher
Extract the domain (the main URL you mention) into its own indexed field and 
use field collapsing/grouping: http://wiki.apache.org/solr/FieldCollapsing

Erik

On Jun 27, 2013, at 08:18 , Wojciech Kapelinski wrote:

 I'm looking for a neat solution to replace default multiple results from
 single domain in SERP
 
 somepage.com/contact.html
 somepage.com/aboutus.html
 otherpage.net/info.html
 somepage.com/directions.html  etc
 
 with only one result per each domain [main URL by default]
 
 somepage.com
 otherpage.net
 completelydifferentpage.org
 
 Tried grouping by Carrot2 but it's not exactly what I'm looking for.
 
 Thanks in advance.



Re: Solr admin search with wildcard

2013-06-27 Thread Amit Sela
The stored and indexed string is actually a url like 
http://www.youtube.com/somethingsomething;.
It looks like removing the quotes does the job: iframe:*youtube* or am I
wrong ? For now, performance is not an issue, but accuracy is and I would
like to know for example how many URLS have iframe source leading to
YouTube for example. So query like: iframe:*youtube* with max rows 10 or
something will return in the response numFound field the total number of
pages that have a tag ifarme with a source matching *youtube, No ?


On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.comwrote:

 No, you cannot use wildcards within a quoted term.

 Tell us a little more about what your strings look like. You might want to
 consider tokenizing or using ngrams to avoid the need for wildcards.

 -- Jack Krupansky

 -Original Message- From: Amit Sela
 Sent: Thursday, June 27, 2013 3:33 AM
 To: solr-user@lucene.apache.org
 Subject: Solr admin search with wildcard


 I'm looking to search (in the solr admin search screen) a certain field
 for:

 *youtube*

 I know that leading wildcards takes a lot of resources but I'm not worried
 with that

 My only question is about the syntax, would this work:

 field:*youtube* ?

 Thanks,

 I'm using Solr 3.6.2



Re: Solr admin search with wildcard

2013-06-27 Thread Jack Krupansky
Just copyField from the string field to a text field and use standard 
tokenization, then you can search the text field for youtube or even 
something that is a component of the URL path. No wildcard required.


-- Jack Krupansky

-Original Message- 
From: Amit Sela

Sent: Thursday, June 27, 2013 8:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr admin search with wildcard

The stored and indexed string is actually a url like 
http://www.youtube.com/somethingsomething;.
It looks like removing the quotes does the job: iframe:*youtube* or am I
wrong ? For now, performance is not an issue, but accuracy is and I would
like to know for example how many URLS have iframe source leading to
YouTube for example. So query like: iframe:*youtube* with max rows 10 or
something will return in the response numFound field the total number of
pages that have a tag ifarme with a source matching *youtube, No ?


On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky 
j...@basetechnology.comwrote:



No, you cannot use wildcards within a quoted term.

Tell us a little more about what your strings look like. You might want to
consider tokenizing or using ngrams to avoid the need for wildcards.

-- Jack Krupansky

-Original Message- From: Amit Sela
Sent: Thursday, June 27, 2013 3:33 AM
To: solr-user@lucene.apache.org
Subject: Solr admin search with wildcard


I'm looking to search (in the solr admin search screen) a certain field
for:

*youtube*

I know that leading wildcards takes a lot of resources but I'm not worried
with that

My only question is about the syntax, would this work:

field:*youtube* ?

Thanks,

I'm using Solr 3.6.2





Re: how to delete on column of a doc in solr

2013-06-27 Thread Jack Krupansky

Atomic update. For example:


curl http://localhost:8983/solr/update?commit=true \
-H Content-Type: application/json -d '
[{id: text-1, text_ss: {set: null}}]'

(From the book!)

That's for one document. If you want to do that for all documents, you will 
have to iterate yourself.


But... it sounds like you have arbitrary, unknown field names (dynamic). If 
you want to delete them, you will need to know the field name. You will have 
to write a loop that reads every document, figures out the dynamic field 
name, and then you can update with atomic update.


You may want to rethink your data model.

-- Jack Krupansky

-Original Message- 
From: anurag.jain

Sent: Thursday, June 27, 2013 8:28 AM
To: solr-user@lucene.apache.org
Subject: how to delete on column of a doc in solr

In my solr schema there is one dynamic field.

  dynamicField name=jobs_*  type=floatindexed=true
stored=true/
So I have one doc value,

docs: [
{
last_name: Jain,
state_name: rajasthan,
mobile_no: 234534564621,
id: 4,
jobs_6554: 6554,

},...]
Now I just want to delete one column, means jobs_6554 not the complete doc.
How it can possible in solr.

So after delete, docs will be.

docs: [
{
last_name: Jain,
state_name: rajasthan,
mobile_no: 234534564621,
id: 4
},...]



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-delete-on-column-of-a-doc-in-solr-tp4073587.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: displaying one result per domain

2013-06-27 Thread Jack Krupansky
The URL Classify Update Processor can take a URL and split it into pieces, 
including the host name.


http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessorFactory.html

Unfortunately, the Javadoc is sparse, not even one example.

I have some examples in the book.

You can also use a regular expression tokenfilter to extract the host name 
as well.


And you can use standard Solr grouping to group by the field containing 
host name.


-- Jack Krupansky

-Original Message- 
From: Wojciech Kapelinski

Sent: Thursday, June 27, 2013 8:18 AM
To: solr-user@lucene.apache.org
Subject: displaying one result per domain

I'm looking for a neat solution to replace default multiple results from
single domain in SERP

somepage.com/contact.html
somepage.com/aboutus.html
otherpage.net/info.html
somepage.com/directions.html  etc

with only one result per each domain [main URL by default]

somepage.com
otherpage.net
completelydifferentpage.org

Tried grouping by Carrot2 but it's not exactly what I'm looking for.

Thanks in advance. 



Re: Classic 4.2 master-slave replication not completing

2013-06-27 Thread Neal Ensor
Okay, I have done this (updated to 4.3.1 across master and four slaves; one
of these is my own PC for experiments, it is not being accessed by clients).

Just had a minor replication this morning, and all three slaves are stuck
again.  Replication supposedly started at 8:40, ended 30 seconds later or
so (on my local PC, set up identically to the other three slaves).  The
three slaves will NOT complete the roll-over to the new index.  All three
index folders have a write.lock and latest files are dated 8:40am (now it
is 8:54am, with no further activity in the index folders).  There exists an
index.2013062708461 (or some variation thereof) in all three slaves'
data folder.

The seemingly-relevant thread dump of a snappuller thread on each of
these slaves:

   - sun.misc.Unsafe.park(Native Method)
   - java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
   -
   
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
   -
   
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
   -
   
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
   - java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
   - java.util.concurrent.FutureTask.get(FutureTask.java:83)
   -
   
org.apache.solr.handler.SnapPuller.openNewWriterAndSearcher(SnapPuller.java:631)
   -
   org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:446)
   -
   
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317)
   - org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:223)
   -
   java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
   -
   java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
   - java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
   -
   
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
   -
   
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
   -
   
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
   -
   
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
   -
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   - java.lang.Thread.run(Thread.java:662)


Here they sit.  My local PC slave replicated very quickly, switched over
to the new generation (206) immediately.  I am not sure why the three
slaves are dragging on this.  If there's any configuration elements or
other details you need, please let me know.  I can manually kick them by
reloading the core from the admin pages, but obviously I would like this to
be a hands-off process.  Any help is greatly appreciated; this has been
bugging me for some time now.



On Mon, Jun 24, 2013 at 9:34 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 A bunch of replication related issues were fixed in 4.2.1 so you're
 better off upgrading to 4.2.1 or later (4.3.1 is the latest release).

 On Mon, Jun 24, 2013 at 6:55 PM, Neal Ensor nen...@gmail.com wrote:
  As a bit of background, we run a setup (coming from 3.6.1 to 4.2
 relatively
  recently) with a single master receiving updates with three slaves
 pulling
  changes in.  Our index is around 5 million documents, around 26GB in size
  total.
 
  The situation I'm seeing is this:  occasionally we update the master, and
  replication begins on the three slaves, seems to proceed normally until
 it
  hits the end.  At that point, it sticks; there's no messages going on
 in
  the logs, nothing on the admin page seems to be happening.  I sit there
 for
  sometimes upwards of 30 minutes, seeing no further activity in the index
  folder(s).   After a while, I go to the core admin page and manually
 reload
  the core, which catches it up.  It seems like the index readers /
 writers
  are not releasing the index otherwise?  The configuration is set to
 reopen;
  very occasionally this situation actually fixes itself after a longish
  period of time, but it seems very annoying.
 
  I had at first suspected this to be due to our underlying shared (SAN)
  storage, so we installed SSDs in all three slave machines, and moved the
  entire indexes to those.  It did not seem to affect this issue at all
  (additionally, I didn't really see the expected performance boost, but
  that's a separate issue entirely).
 
  Any ideas?  Any configuration details I might share/reconfigure?  Any
  suggestions are appreciated. I could also upgrade to the later 4.3+
  versions, if that might help.
 
  Thanks!
 
  Neal Ensor
  nen...@gmail.com



 --
 Regards,
 Shalin Shekhar Mangar.



Dot operater issue.

2013-06-27 Thread Srinivasa Chegu
Hi team,

When the user enter search term as h.e.r.b.a.l  in the search textbox and 
click on search button then  SOLR search engine is not returning any  results 
found. As I can see SOLR is accepting the request parameter as h.e.r.b.a.l. 
However we have many records with the string h.e.r.b.a.l as part of the product 
name.

Look like there is an issue with dot operator in the search term.  If we enter 
search term as herbal then it is returning search results .

Our requirement is search term should be h.e.r.b.a.l then it needs to display 
results based on dot operator .

Please help us on this issue.

Regards
Srinivas


::DISCLAIMER::


The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information 
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on 
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the 
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written 
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please 
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and 
other defects.




Re: Dot operater issue.

2013-06-27 Thread Sandeep Mestry
Hi Sri,

This depends on how the fields (that hold the value) are defined and how
the query is generated.
Try running the query in solr console and use debug=true to see how the
query string is getting parsed.

If that doesn't help then could you answer following 3 questions relating
to your question.

1) field definition in schema.xml
2) solr query url
3) parser config from solrconfig.xml


Thanks,
Sandeep


On 27 June 2013 10:41, Srinivasa Chegu cheg...@hcl.com wrote:

 Hi team,

 When the user enter search term as h.e.r.b.a.l  in the search textbox
 and click on search button then  SOLR search engine is not returning any
  results found. As I can see SOLR is accepting the request parameter as
 h.e.r.b.a.l. However we have many records with the string h.e.r.b.a.l as
 part of the product name.

 Look like there is an issue with dot operator in the search term.  If we
 enter search term as herbal then it is returning search results .

 Our requirement is search term should be h.e.r.b.a.l then it needs to
 display results based on dot operator .

 Please help us on this issue.

 Regards
 Srinivas


 ::DISCLAIMER::

 

 The contents of this e-mail and any attachment(s) are confidential and
 intended for the named recipient(s) only.
 E-mail transmission is not guaranteed to be secure or error-free as
 information could be intercepted, corrupted,
 lost, destroyed, arrive late or incomplete, or may contain viruses in
 transmission. The e mail and its contents
 (with or without referred errors) shall therefore not attach any liability
 on the originator or HCL or its affiliates.
 Views or opinions, if any, presented in this email are solely those of the
 author and may not necessarily reflect the
 views or opinions of HCL or its affiliates. Any form of reproduction,
 dissemination, copying, disclosure, modification,
 distribution and / or publication of this message without the prior
 written consent of authorized representative of
 HCL is strictly prohibited. If you have received this email in error
 please delete it and notify the sender immediately.
 Before opening any email and/or attachments, please check them for viruses
 and other defects.


 



TermVector and Sharding issue

2013-06-27 Thread Stanislav Sandalnikov
Hello everyone,

I saw that the ticket regarding this issue is still open (
https://issues.apache.org/jira/browse/SOLR-4479). There is last comment
that suggests to reindex documents with solr 4.2. I did reindex with 4.3
version but term vector still doesn't work producing null pointer
exception.

So, does anyone had the same problem? Is there a workaround?


Re: Solr admin search with wildcard

2013-06-27 Thread Amit Sela
Forgive my ignorance but I want to  be sure, do I add copyField
source=iframe dest=text/ to solrindex-mapping.xml?
so that my solrindex-mapping.xml looks like this:
fields
field dest=content source=content/
field dest=title source=title/
field dest=iframe source=iframe/
field dest=host source=host/
field dest=segment source=segment/
field dest=boost source=boost/
field dest=digest source=digest/
field dest=tstamp source=tstamp/
field dest=id source=url/
copyField source=url dest=url/
*copyField source=iframe dest=text/ *
/fields
uniqueKeyurl/uniqueKey

And what do you mean by standard tokenization ?

Thanks!


On Thu, Jun 27, 2013 at 3:43 PM, Jack Krupansky j...@basetechnology.comwrote:

 Just copyField from the string field to a text field and use standard
 tokenization, then you can search the text field for youtube or even
 something that is a component of the URL path. No wildcard required.


 -- Jack Krupansky

 -Original Message- From: Amit Sela
 Sent: Thursday, June 27, 2013 8:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr admin search with wildcard


 The stored and indexed string is actually a url like 
 http://www.youtube.com/**somethingsomethinghttp://www.youtube.com/somethingsomething
 .
 It looks like removing the quotes does the job: iframe:*youtube* or am I
 wrong ? For now, performance is not an issue, but accuracy is and I would
 like to know for example how many URLS have iframe source leading to
 YouTube for example. So query like: iframe:*youtube* with max rows 10 or
 something will return in the response numFound field the total number of
 pages that have a tag ifarme with a source matching *youtube, No ?


 On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  No, you cannot use wildcards within a quoted term.

 Tell us a little more about what your strings look like. You might want to
 consider tokenizing or using ngrams to avoid the need for wildcards.

 -- Jack Krupansky

 -Original Message- From: Amit Sela
 Sent: Thursday, June 27, 2013 3:33 AM
 To: solr-user@lucene.apache.org
 Subject: Solr admin search with wildcard


 I'm looking to search (in the solr admin search screen) a certain field
 for:

 *youtube*

 I know that leading wildcards takes a lot of resources but I'm not worried
 with that

 My only question is about the syntax, would this work:

 field:*youtube* ?

 Thanks,

 I'm using Solr 3.6.2





Re: Data Import Handler and Extract Handler

2013-06-27 Thread Gora Mohanty
On 27 June 2013 13:42, Venter, Scott scott.ven...@rmb.co.za wrote:
 Hi all,

 I am new to SOLR. I have been working through the SOLR 4 Cookbook and my 
 experiences so far have been great.

 I have worked through the extraction of PDF data recipe, and the Data import 
 recipe. I would now like to join these two things, i.e. I would like to do a 
 data import from a Database table of users, and then somehow associate 
 indexed PDF data with rows that were imported.

 I have a conceptual link between rows in the database and pdf documents, but 
 I don't know how to make a physical link between the two in SOLR. For 
 example, I know that user x has pdf documents a, b and c.

 If I have imported my users into SOLR using Data Import Handler, how would I

 1) import and associate the pdf documents using the extract mechanism, in 
 such a way that there is a link between user x and the 3 pdf documents as 
 described above?
[...]

Where are your PDF documents? Presumably on the filesystem
or available from a web service. What you can do is to have
two datasources in your DIH configuration file:
* The first one is a JdbcDataSource that extracts data from a
   database. Presumably, you already have this working.
* The second is a BinFileDataSource assuming that your
   PDF files are on the filesystem.
* In the top-level entity, select the user and the names of the
  associated PDF files.
* Use a nested inner entity with the dataSource attribute set
  to the BinFileDataSource, and use the TikaEntityProcessor
  to index the PDF files. The documentation on this is a little
  scattered, but see:
  http://wiki.apache.org/solr/TikaEntityProcessor
  
http://lucene.472066.n3.nabble.com/problem-to-indexing-pdf-directory-td3749554.html

Regards,
Gora


Re: Filter queries taking a long time, even with cache disabled

2013-06-27 Thread Dotan Cohen
On Thu, Jun 27, 2013 at 12:14 PM, Upayavira u...@odoko.co.uk wrote:
 can you give an example?


Thank you. This is an example query:
select
?q=search_field:iraq
fq={!cache=false}search_field:love%20obama
defType=edismax

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Classic 4.2 master-slave replication not completing

2013-06-27 Thread Mark Miller
Odd - looks like it's stuck waiting to be notified that a new searcher is ready.

- Mark

On Jun 27, 2013, at 8:58 AM, Neal Ensor nen...@gmail.com wrote:

 Okay, I have done this (updated to 4.3.1 across master and four slaves; one
 of these is my own PC for experiments, it is not being accessed by clients).
 
 Just had a minor replication this morning, and all three slaves are stuck
 again.  Replication supposedly started at 8:40, ended 30 seconds later or
 so (on my local PC, set up identically to the other three slaves).  The
 three slaves will NOT complete the roll-over to the new index.  All three
 index folders have a write.lock and latest files are dated 8:40am (now it
 is 8:54am, with no further activity in the index folders).  There exists an
 index.2013062708461 (or some variation thereof) in all three slaves'
 data folder.
 
 The seemingly-relevant thread dump of a snappuller thread on each of
 these slaves:
 
   - sun.misc.Unsafe.park(Native Method)
   - java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
   -
   
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
   -
   
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
   -
   
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
   - java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
   - java.util.concurrent.FutureTask.get(FutureTask.java:83)
   -
   
 org.apache.solr.handler.SnapPuller.openNewWriterAndSearcher(SnapPuller.java:631)
   -
   org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:446)
   -
   
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317)
   - org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:223)
   -
   java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
   -
   java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
   - java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
   -
   
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
   -
   
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
   -
   
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
   -
   
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
   -
   
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
   - java.lang.Thread.run(Thread.java:662)
 
 
 Here they sit.  My local PC slave replicated very quickly, switched over
 to the new generation (206) immediately.  I am not sure why the three
 slaves are dragging on this.  If there's any configuration elements or
 other details you need, please let me know.  I can manually kick them by
 reloading the core from the admin pages, but obviously I would like this to
 be a hands-off process.  Any help is greatly appreciated; this has been
 bugging me for some time now.
 
 
 
 On Mon, Jun 24, 2013 at 9:34 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:
 
 A bunch of replication related issues were fixed in 4.2.1 so you're
 better off upgrading to 4.2.1 or later (4.3.1 is the latest release).
 
 On Mon, Jun 24, 2013 at 6:55 PM, Neal Ensor nen...@gmail.com wrote:
 As a bit of background, we run a setup (coming from 3.6.1 to 4.2
 relatively
 recently) with a single master receiving updates with three slaves
 pulling
 changes in.  Our index is around 5 million documents, around 26GB in size
 total.
 
 The situation I'm seeing is this:  occasionally we update the master, and
 replication begins on the three slaves, seems to proceed normally until
 it
 hits the end.  At that point, it sticks; there's no messages going on
 in
 the logs, nothing on the admin page seems to be happening.  I sit there
 for
 sometimes upwards of 30 minutes, seeing no further activity in the index
 folder(s).   After a while, I go to the core admin page and manually
 reload
 the core, which catches it up.  It seems like the index readers /
 writers
 are not releasing the index otherwise?  The configuration is set to
 reopen;
 very occasionally this situation actually fixes itself after a longish
 period of time, but it seems very annoying.
 
 I had at first suspected this to be due to our underlying shared (SAN)
 storage, so we installed SSDs in all three slave machines, and moved the
 entire indexes to those.  It did not seem to affect this issue at all
 (additionally, I didn't really see the expected performance boost, but
 that's a separate issue entirely).
 
 Any ideas?  Any configuration details I might share/reconfigure?  Any
 suggestions are appreciated. I could also upgrade to the later 4.3+
 versions, if that might help.
 
 Thanks!
 
 

ConcurrentUpdateSolrServer hanging

2013-06-27 Thread qungg
Hi,

I'm using concurrentUpdateSolrServer to do my incremental indexing nightly.
I have 50 shards to index into, about 10,000 documents each night. I start
one concurrentUpdateSolrServer on each shards and start to send documents.
The queue size for concurrentUpdateSolrServer is 100, and 4 threads. At the
end of the import, i will send commit using the same
concurrentUpdateSolrServer. The problem is some of the
concurrentUpdateSolrServer is not sending the commit to the shards and the
import task hangs for a couple hours. 

So I looked at the log and find out that the shards received about 1000
document couple hours later following with a commit. Is there anything
methods I can call to flush out documents before I send the commit? Or are
there any existing issue related to concurrentUpdateSolrServer related to
this?

Thanks,
Qun



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrServer-hanging-tp4073620.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is Overlapping onDeckSearchers=2 really a problem?

2013-06-27 Thread Shawn Heisey
On 6/27/2013 5:59 AM, Robert Krüger wrote:
 sometime forcing oneself to describe a problem is the first step to a
 solution. I just realized that I also had an autocommit statement in
 my config with the exact same amount of time the seemed to be between
 the warnings.
 
 I removed that, because I don't think I really need it, and now the
 warnings are gone. So it seems it happened whenever my manual commits
 overlapped with an autocommit, which, of course, was more likely when
 many commits were issued in sequence.

If all you are doing is soft commits, your transaction logs are going to
grow out of control.

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

My recommendation:

1) Remove all commits from your indexing application.
2) Configure autoCommit with values similar to that wiki page.
3) Configure autoSoftCommit to happen often.

The autoCommit must have openSearcher set to false.  For autoSoftCommit,
include a maxTime between 1000 and 5000 (milliseconds) and leave maxDocs
out.

Thanks,
Shawn



solrj indexing using embedded solr is slow

2013-06-27 Thread Learner
I was using ConcurrentUpdateSOLR for indexing documents to Solr. Later I had
a need to do portable indexing hence started using Embedded solr server.

I created a multithreaded program to create /submit the documents in batch
of 100 to Embedded SOLR server (running inside Solrj indexing process) but
for some reason it takes more time to index the data when compared with
ConcurrentUpdateSOLR server(CUSS). I was under assumption that embedded
server would take less time compared to http update (made when using CUSS)
but not sure why it takes more time...

Is there a way to speed up the indexing when using Embedded solr
serveretc..(something like specifying thread and queue size similar to
CUSS)?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrj-indexing-using-embedded-solr-is-slow-tp4073636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR online reference document - WIKI

2013-06-27 Thread Luis Lebolo
This page never came up on any of my Google searches, so thanks for the
heads up! Looks good.

-Luis


On Tue, Jun 25, 2013 at 12:32 PM, Learner bbar...@gmail.com wrote:

 I just came across a wonderful online reference wiki for SOLR and thought
 of
 sharing it with the community..


 https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-online-reference-document-WIKI-tp4073110.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR online reference document - WIKI

2013-06-27 Thread Upayavira
It is all new, and as yet unreleased. It still has more work needed on
formatting, etc, so I guess you could say, make of it what you will, and
don't yet assume it will always be up and available.

Upayavira

On Thu, Jun 27, 2013, at 04:25 PM, Luis Lebolo wrote:
 This page never came up on any of my Google searches, so thanks for the
 heads up! Looks good.
 
 -Luis
 
 
 On Tue, Jun 25, 2013 at 12:32 PM, Learner bbar...@gmail.com wrote:
 
  I just came across a wonderful online reference wiki for SOLR and thought
  of
  sharing it with the community..
 
 
  https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/SOLR-online-reference-document-WIKI-tp4073110.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: ConcurrentUpdateSolrServer hanging

2013-06-27 Thread Michael Della Bitta
Qun,

Are you using blockUntilFinished() and/or shutdown()?

One of the things to note is that a commit is just another document, so
writing a commit into the queue of the ConcurrentUpdateSolrServer isn't
enough to get it flushed out.


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Thu, Jun 27, 2013 at 10:21 AM, qungg qzheng1...@gmail.com wrote:

 Hi,

 I'm using concurrentUpdateSolrServer to do my incremental indexing nightly.
 I have 50 shards to index into, about 10,000 documents each night. I start
 one concurrentUpdateSolrServer on each shards and start to send documents.
 The queue size for concurrentUpdateSolrServer is 100, and 4 threads. At the
 end of the import, i will send commit using the same
 concurrentUpdateSolrServer. The problem is some of the
 concurrentUpdateSolrServer is not sending the commit to the shards and the
 import task hangs for a couple hours.

 So I looked at the log and find out that the shards received about 1000
 document couple hours later following with a commit. Is there anything
 methods I can call to flush out documents before I send the commit? Or are
 there any existing issue related to concurrentUpdateSolrServer related to
 this?

 Thanks,
 Qun



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrServer-hanging-tp4073620.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Configuring Solr to retrieve documents?

2013-06-27 Thread Michael Della Bitta
Hi,

I haven't used it yet, but I believe you can do this using the
FileDataSource feature of DataImportHandler:

http://wiki.apache.org/solr/DataImportHandler#FileDataSource

HTH,


Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Wed, Jun 26, 2013 at 2:12 PM, aspielman aspiel...@gmail.com wrote:

 Is it possible to to configure Solr to automatically grab documents in a
 specidfied directory, with having to use the post command?

 I've not found any way to do this, though admittedly, I'm not terribly
 experienced with config files of this type.

 Thanks!



 -
 | A.Spielman |
 In theory there is no difference between theory and practice. In practice
 there is. - Chuck Reid
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Configuring-Solr-to-retrieve-documents-tp4073372.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: ConcurrentUpdateSolrServer hanging

2013-06-27 Thread qungg
Hi Michael,

I realized that I might have to use blockUntilFinished before commit, but do
I have to use shutdown as well??

Thanks,
Qun



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrServer-hanging-tp4073620p4073651.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr.DirectUpdateHandler2 failed to instantiate

2013-06-27 Thread Mark Bennett
Jack,

Did you ever find a fix for this?

I'm having similar issues (different parts of solrconfig) and my guess is it's 
a config issue somewhere, vs. a proper casting problem, some nested init issue.

Was curious what you found?


On Mar 13, 2013, at 11:52 AM, Jack Park jackp...@topicquests.org wrote:

 I can safely say that it is not DirectUpdateHandler2 failing;  By
 commenting out my own handlers, the system boots without error.
 
 This means that my handlers are problematic in some way. The moment I
 put back just one of my handlers:
 
 updateRequestProcessorChain name=harvest default=true
  processor class=solr.RunUpdateProcessorFactory/
  processor
 class=org.apache.solr.update.TopicQuestsDocumentProcessFactory
str name=inputFieldhello/str
  /processor
  processor class=solr.LogUpdateProcessorFactory/
 /updateRequestProcessorChain
 
 requestHandler name=/update/harvest
  class=solr.DirectUpdateHandler2
   lst name=defaults
 str name=update.chainharvest/str
/lst
 
 /requestHandler
 
 The problem returns.  It simply appears that I cannot declare a named
 requestHandler using that class.
 
 Jack
 
 On Tue, Mar 12, 2013 at 12:22 PM, Jack Park jackp...@topicquests.org wrote:
 Indeed! Perhaps the germane part is this, before the failure to
 instantiate notice:
 
 Caused by: java.lang.ClassCastException: class 
 org.apache.solr.update.DirectUpda
 teHandler2
at java.lang.Class.asSubclass(Unknown Source)
at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.
 java:432)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:507)
 
 This suggests that I might be doing something wrong elsewhere in 
 solrconfig.xml.
 
 The possibly relevant parts (my contributions) are these:
 
 updateRequestProcessorChain name=partial default=true
  processor class=solr.RunUpdateProcessorFactory/
  processor class=solr.LogUpdateProcessorFactory/
 /updateRequestProcessorChain
 
 updateRequestProcessorChain name=harvest default=true
  processor class=solr.RunUpdateProcessorFactory/
  processor
 class=org.apache.solr.update.TopicQuestsDocumentProcessFactory
str name=inputFieldhello/str
  /processor
  processor class=solr.LogUpdateProcessorFactory/
 /updateRequestProcessorChain
 
 requestHandler name=/update/harvest
  class=solr.DirectUpdateHandler2
   lst name=defaults
 str name=update.chainharvest/str
/lst
 
 /requestHandler
 
 requestHandler name=/update/partial
  class=solr.DirectUpdateHandler2
   lst name=defaults
 str name=update.chainpartial/str
   /lst
 /requestHandler
 
 Thanks
 Jack
 
 On Tue, Mar 12, 2013 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote:
 There should be a stack trace - also, you shouldn't have to do anything 
 special to use this class. It's the default and only truly supported 
 implementation…
 
 - Mark
 
 On Mar 12, 2013, at 2:53 PM, Jack Park jackp...@topicquests.org wrote:
 
 That messages gives great, but terrible google. Zillions of hits,
 mostly filled with very long log traces, and zero messages (that I
 could find) about what to do about it.
 
 I switched over to using that handler since it has an update log
 specified, and that's the only place I've found how to use update log.
 But, can't boot now.
 
 All the jars are in place; I'm able to import that class in my code.
 
 Is there any news on that issue?
 
 Many thanks
 Jack
 
 FLAGS ()



Re: solrj indexing using embedded solr is slow

2013-06-27 Thread Shawn Heisey

On 6/27/2013 9:19 AM, Learner wrote:

I was using ConcurrentUpdateSOLR for indexing documents to Solr. Later I had
a need to do portable indexing hence started using Embedded solr server.

I created a multithreaded program to create /submit the documents in batch
of 100 to Embedded SOLR server (running inside Solrj indexing process) but
for some reason it takes more time to index the data when compared with
ConcurrentUpdateSOLR server(CUSS). I was under assumption that embedded
server would take less time compared to http update (made when using CUSS)
but not sure why it takes more time...

Is there a way to speed up the indexing when using Embedded solr
serveretc..(something like specifying thread and queue size similar to
CUSS)?


A lot more time has been spent optimizing the traditional Solr server 
model than the embedded version.


If you want the same performance from Embedded that you get from 
Concurrent, you'll need to use that object in multiple threads that you 
create yourself.  The Concurrent object handles all that threading for 
you, but due to its nature, Embedded can't.  You say that your program 
is multithreaded, so I really don't know what's going on here.


An FYI that on something that might have escaped your awareness: CUSS 
swallows exceptions - it will never inform the calling application about 
errors that occur, unless you override its handleError method in some 
way, and I don't know what is required to make it do that.  This is part 
of why CUSS is so fast - it returns to the calling application 
*immediately*, no matter what actually happens in the background while 
talking to the server.


Thanks,
Shawn



Re: ConcurrentUpdateSolrServer hanging

2013-06-27 Thread Shawn Heisey

On 6/27/2013 9:32 AM, Michael Della Bitta wrote:

Are you using blockUntilFinished() and/or shutdown()?

One of the things to note is that a commit is just another document, so
writing a commit into the queue of the ConcurrentUpdateSolrServer isn't
enough to get it flushed out.


ConcurrentUpdateSolrServer contains this little bit of code:

// this happens for commit...
if (req.getDocuments() == null || req.getDocuments().isEmpty()) {
  blockUntilFinished();
  return server.request(request);
}

Unless the comment is incorrect or there's a bug, sending a commit() 
will inherently do the blockUntilFinished().


Thanks,
Shawn



Re: URL search and indexing

2013-06-27 Thread Erick Erickson
Right, string fields are a little tricky, they're easy to confuse with
fields that actually _do_ something.

By default, norms and term frequencies are turned off for types based on '
class=solr.StrField '. So any field length normalization (i.e. terms that
appear in shorter fields count more) and term frequencies calculations are
_not_ include in the score calculation.

Try blowing your index away and adding this to your fields to see the
difference

omitNorms=false omitTermFreqAndPositions=false

You probably want to either turn these on explicitly for your string types
or use a type based on 'class=solr.TextField ' since these options
default to false for text fields. If you use something like
keywordTokenizerFactory you also won't get your URL split up into pieces.
And in that case you can also normalize the values with something like
lowerCaseFilter which you can't do with string types since they're
completely unanalyzed.

Best
Erick


On Wed, Jun 26, 2013 at 11:34 AM, Flavio Pompermaier
pomperma...@okkam.itwrote:

 Obviously I messed up with email thread...however I found a problem
 indexing my document via post.sh.
 This is basically my schema.xml:

 schema name=dopa-schema version=1.5
  fields
field name=url type=string indexed=true stored=true
 required=true multiValued=false /
field name=itemid type=string indexed=true stored=true
 multiValued=true/
field name=_version_ type=long indexed=true stored=true/
  /fields
  uniqueKeyurl/uniqueKey
   types
 fieldType name=string class=solr.StrField sortMissingLast=true
 /
 fieldType name=long class=solr.TrieLongField precisionStep=0
 positionIncrementGap=0/
  /types
 /schema

 and this is the document I tried to upload via post.sh:

 add
 doc
   field name=urlhttp://test.example.org/first.html/field
   field name=itemid1000/field
   field name=itemid1000/field
   field name=itemid1000/field
   field name=itemid5000/field
 /doc
 doc
   field name=urlhttp://test.example.org/second.html/field
   field name=itemid1000/field
   field name=itemid5000/field
 /doc
 /add

 When playing with administration and debugging tools I discovered that
 searching for q=itemid:5000 gave me the same score for those docs, while I
 was expecting different term frequencies between the first and the second.
 In fact, using java to upload documents lead to correct results (3
 occurrences of item 1000 in the first doc and 1 in the second), e.g.:
 document1.addField(itemid, 1000);
 document1.addField(itemid, 1000);
 document1.addField(itemid, 1000);

 Am I right or am I missing something else?


 On Wed, Jun 26, 2013 at 5:18 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  If there is a bug... we should identify it. What's a sample post command
  that you issued?
 
 
  -- Jack Krupansky
 
  -Original Message- From: Flavio Pompermaier
  Sent: Wednesday, June 26, 2013 10:53 AM
 
  To: solr-user@lucene.apache.org
  Subject: Re: URL search and indexing
 
  I was doing exactly that and, thanks to the administration page and
  explanation/debugging, I checked if results were those expected.
  Unfortunately, results were not correct submitting updates trough post.sh
  script (that use curl in the end).
  Probably, if it founds the same tag (same value for the same field-name),
  it will collapse them.
  Rewriting the same document in Java and submitting the updates did the
  things work correctly.
 
  In my opinion this is a bug (of the entire process, then I don't know it
  this is a problem of curl or of the script itself).
 
  Best,
  Flavio
 
  On Wed, Jun 26, 2013 at 4:18 PM, Erick Erickson erickerick...@gmail.com
 *
  *wrote:
 
   Flavio:
 
  You mention that you're new to Solr, so I thought I'd make sure
  you know that the admin/analysis page is your friend! I flat
  guarantee that as you try to index/search following the suggestions
  you'll scratch your head at your results and you'll discover that
  the analysis process isn't doing quite what you expect. The
  admin/analysis page shows you the transformation of the input
  at each stage, i.e. how the input is tokenized, what transformations
  are applied to each token etc. It's invaluable!
 
  Best
  Erick
 
  P.S. Feel free to un-check the verbose box, it provides lots
  of information but can be overwhelming, especially at first!
 
  On Wed, Jun 26, 2013 at 12:20 AM, Flavio Pompermaier
  pomperma...@okkam.it wrote:
   Ok thank you all for the great help!
   Now I'm ready to start playing with my index!
  
   Best,
   Flavio
  
  
   On Tue, Jun 25, 2013 at 11:40 PM, Jack Krupansky 
  j...@basetechnology.comwrote:
  
   Yeah, URL Classify does only do so much. That's why you need to
 combine
   multiple methods.
  
   As a fourth method, you could code up a short JavaScript **
   StatelessScriptUpdateProcessor that did something like take a
  full
   domain name (such as output by URL Classify) and turn it into
 multiple
   values, each with more of the prefix removed, so that 
  

Field Query After Collapse.Field?

2013-06-27 Thread slevytam
Hello,

I've struggling to find a way to query after collapse.field is performed and
I'm hoping someone can help.

I'm doing a multiple core(index) search which generates results that can
have varying fields.
ex.
entry_id, entry_starred
entry_id, entry_read

I perform a collapse.field on entry_id which yields:
ex. entry_id, entry_starred, entry_read

But if I try to do a fq on one of the fields
ex. fq=!entry_read:1

The fq is performed before the collapse leading to incorrect results.

Is there anyway to perform the field query after the results are collapsed?

Thanks,

slevytam



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-Query-After-Collapse-Field-tp4073691.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: StatsComponent doesn't work if field's type is TextField - can I change field's type to String

2013-06-27 Thread Erick Erickson
I stand corrected, you're absolutely right about string types. But I still
don't think text types are supported, at least in my quick test of the
stock Solr distro, trying to gather stats on the subject field produced
the error below. Note that string is a completely unanalyzed type, no
tokenization etc. so it's actually a different beast than text types.

Field type
text_general{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={class=solr.TextField,
positionIncrementGap=100}} is not currently supported


On Wed, Jun 26, 2013 at 11:37 AM, Elran Dvir elr...@checkpoint.com wrote:

 Erick, thanks for the response.

 I think the stats component works with strings.

 In StatsValuesFactory, I see the following code:

 public static StatsValues createStatsValues(SchemaField sf) {
 ...
else if (StrField.class.isInstance(fieldType)) {
   return new StringStatsValues(sf);
 } 
   }

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, June 26, 2013 5:30 PM
 To: solr-user@lucene.apache.org
 Subject: Re: StatsComponent doesn't work if field's type is TextField -
 can I change field's type to String

 From the stats component page:

 The stats component returns simple statistics for indexed numeric fields
 within the DocSet

 So string, text, anything non-numeric won't work. You can declare it
 multiValued but then you have to add multiple values for the field when you
 send the doc to Solr or implement a custom update component to break them
 up. At least there's no filter that I know of that takes a delimited set of
 numbers and transforms them.

 FWIW,
 Erick

 On Wed, Jun 26, 2013 at 4:14 AM, Elran Dvir elr...@checkpoint.com wrote:
  Hi all,
 
  StatsComponent doesn't work if field's type is TextField.
  I get the following message:
  Field type
  textstring{class=org.apache.solr.schema.TextField,analyzer=org.apache.
  solr.analysis.TokenizerChain,args={positionIncrementGap=100,
  sortMissingLast=true}} is not currently supported.
 
  My field configuration is:
 
  fieldType name=mvstring class=solr.TextField positionIncrementGap=
 100 sortMissingLast=true
  analyzer type=index
  tokenizer class=solr.PatternTokenizerFactory
 pattern=\n /
  /analyzer
  /fieldType
 
  field name=myField type=mvstring indexed=true stored=false
  multiValued=true/
 
  So, the reason my field is of type TextField is that in the document
 indexed there may be multiple values in the field separated by new lines.
  The tokenizer is splitting it to multiple values and the field is
 indexed as multi-valued field.
 
  Is there a way I can define the field as regular String field? Or a way
 to make StatsComponent work with TextField?
 
  Thank you very much.

 Email secured by Check Point



Re: Querying multiple collections in SolrCloud

2013-06-27 Thread Erick Erickson
I'd _guess_ that this is unsupported across collections if
for no other reason than scores really aren't comparable
across collections and the default ordering within groups
is score. This is really a federated search type problem.

But if it makes sense to use N collections for other reasons,
it's really the same thing as grouping functionally, you just
send a separate request to each collection and combine
the results of those N requests rather than from N
groups in a single query. If the collections are hosted on
different machines for instance, you might get quicker
overall response by firing off parallel queries,
It Depends (tm)...

Best
Erick


On Wed, Jun 26, 2013 at 1:46 PM, Chris Toomey ctoo...@gmail.com wrote:

 Thanks Erick, that's a very helpful answer.

 Regarding the grouping option, does that require all the docs to be put
 into a single collection, or could it be done with across N collections
 (assuming each collection had a common type field for grouping on)?

 Chris


 On Wed, Jun 26, 2013 at 7:01 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  bq: Would the above setup qualify as multiple compatible collections
 
  No. While there may be enough fields in common to form a single query,
  the TF/IDF calculations will not be compatible and the scores from the
  various collections will NOT be comparable. So simply getting the list of
  top N docs will probably be dominated by the docs from a single type.
 
  bq: How does SolrCloud combine the query results from multiple
 collections?
 
  It doesn't. SolrCloud sorts the results from multiple nodes in the
  _same_ collection
  according to whatever sort criteria are specified, defaulting to score.
  Say you
  ask for the top 20 docs. A node from each shard returns the top 20 docs
  for that
  shard. The node processing them just merges all the returned lists and
  only keeps
  the top 20.
 
  I don't think your last two questions are really relevant, SolrCloud
  isn't built to
  query multiple collections and return the results coherently.
 
  The root problem here is that you're trying to compare docs from
  different collections for goodness to return the top N. This isn't
  actually hard
  _except_ when goodness is the score, then it just doesn't work. You
 can't
  even compare scores from different queries on the _same_ collection, much
  less different ones. Consider two collections, books and songs. One
  consists
  of lots and lots of text and the ter frequency and inverse doc freq
  (TF/IDF)
  will be hugely different than songs. Not to mention field length
  normalization.
 
  Now, all that aside there's an option. Index all the docs in a single
  collection and
  use grouping (aka field collapsing) to get a single response that has the
  top N
  docs from each type (they'll be in different sections of the original
  response) and present
  them to the user however makes sense. You'll get hands on experience in
  why this isn't something that's easy to do automatically if you try to
  sort these
  into a single list by relevance G...
 
  Best
  Erick
 
  On Tue, Jun 25, 2013 at 3:35 PM, Chris Toomey ctoo...@gmail.com wrote:
   Thanks Jack for the alternatives.  The first is interesting but has the
   downside of requiring multiple queries to get the full matching docs.
   The
   second is interesting and very simple, but has the downside of not
 being
   modular and being difficult to configure field boosting when the
   collections have overlapping field names with different boosts being
  needed
   for the same field in different document types.
  
   I'd still like to know about the viability of my original approach
 though
   too.
  
   Chris
  
  
   On Tue, Jun 25, 2013 at 3:19 PM, Jack Krupansky 
 j...@basetechnology.com
  wrote:
  
   One simple scenario to consider: N+1 collections - one collection per
   document type with detailed fields for that document type, and one
  common
   collection that indexes a subset of the fields. The main user query
  would
   be an edismax over the common fields in that main collection. You
 can
   then display summary results from the common collection. You can also
  then
   support drill down into the type-specific collection based on a
 type
   field for each document in the main collection.
  
   Or, sure, you actually CAN index multiple document types in the same
   collection - add all the fields to one schema - there is no time or
  space
   penalty if most of the field are empty for most documents.
  
   -- Jack Krupansky
  
   -Original Message- From: Chris Toomey
   Sent: Tuesday, June 25, 2013 6:08 PM
   To: solr-user@lucene.apache.org
   Subject: Querying multiple collections in SolrCloud
  
  
   Hi, I'm investigating using SolrCloud for querying documents of
  different
   but similar/related types, and have read through docs. on the wiki and
  done
   many searches in these archives, but still have some questions.
  Thanks
  in
   advance for your help.
  
   

Change of email

2013-06-27 Thread abillavara

Dear List Managers
I've changed my email that I'd like to use for the solr-user list, as 
it's filling up my work email to the point of insanity.


Regardless of the change in the solr-user community, it still keeps 
sending the emails of all threads and replies to my work email.  Would 
you please be so kind to affect this change for me?  The new email is a 
yahoo email, and is already showing in my preferences


Thank you kindly
Anria


Re: Replicating files containing external file fields

2013-06-27 Thread Erick Erickson
Haven't tried this, but I _think_ you can use the
confFiles trick with relative paths, see:
http://wiki.apache.org/solr/SolrReplication

Or just put your EFF files in the data dir?

Best
Erick


On Wed, Jun 26, 2013 at 9:01 PM, Arun Rangarajan
arunrangara...@gmail.comwrote:

 From https://wiki.apache.org/solr/SolrReplication I understand that index
 dir and any files under the conf dir can be replicated to slaves. I want to
 know if there is any way the files under the data dir containing external
 file fields can be replicated. These are not replicated by default.
 Currently we are running the ext file field reload script on both the
 master and the slave and then running reloadCache on each server once they
 are loaded.



Re: solrj indexing using embedded solr is slow

2013-06-27 Thread Learner
Shawn,

Thanks a lot for your reply.

I have pasted my entire code below, it would be great if you can let me know
if I am doing anything wrong in terms of running the code in multithreaded
environment.

http://pastebin.com/WRLn3yWn



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrj-indexing-using-embedded-solr-is-slow-tp4073636p4073711.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Change of email

2013-06-27 Thread Upayavira


On Thu, Jun 27, 2013, at 06:48 PM, abillav...@innoventsolutions.com
wrote:
 Dear List Managers
 I've changed my email that I'd like to use for the solr-user list, as 
 it's filling up my work email to the point of insanity.
 
 Regardless of the change in the solr-user community, it still keeps 
 sending the emails of all threads and replies to my work email.  Would 
 you please be so kind to affect this change for me?  The new email is a 
 yahoo email, and is already showing in my preferences

Simply unsubscribe yourself (mail
solr-user-unsubscr...@lucene.apache.org) from your work address. Then
subscribe from the new address.

If you have difficulties with unsubscribing, then a mail administrator
can help you sort it.

Upayavira


Querying across multiple *identical* Collections

2013-06-27 Thread Otis Gospodnetic
Hi,

This search across multiple collections question has come up a few
times recently:

http://search-lucene.com/m/2Q1BE0IT4Y/subj=Search+across+multiple+collections
http://search-lucene.com/m/5JQrXIyhQQ1/subj=Querying+multiple+collections+in+SolrCloud

One important variation of this Q is - can one search across MULTIPLE
IDENTICAL collections.

The use case is that you need to index/archive a lot of data, but
because your searches have a time range filter, instead of having 1
massive Collection you have to search, you really want to have N
smaller Collection, say weekly, so you can search smaller
Collection(s).

For example:
A query that limits matches to docs from only the last 48 hours can be
routed only to the Collection for the latest/current week.
If the time range filter needs data from multiple Collections (e.g.
it's for the last 10 days and we have weekly collections), then
IDEALLY, you want to be able to send ONE request to Solr and specify 2
Collections to search and have Solr handle calling each Collection and
merging.

Yes, in case of full-text search global IDF would ideally be used, but
Solr is increasingly used for analytical queries and not just
full-text queries, and one doesn't need global IDF for that.

So: Can one query *multiple identical* Collections with one request
from the client?
If not: should I open a new JIRA issue?

I see https://issues.apache.org/jira/browse/SOLR-4497 allows aliasing
multiple Collections, which covers the use-case where you know which
Collections might be queried.  But in some cases you don't know that
ahead of time, so you can't prepare all the aliases.  In that case you
wold want to be able to list all Collections to search in the request
and that's it.

Maybe this is already doable?

Thanks,
Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm


Re: solr.DirectUpdateHandler2 failed to instantiate

2013-06-27 Thread Jack Park
Wow! That's been a while back, and it appears that my journal didn't
carry a good trace of what I did. Here's a reconstruction:

From my earlier attempt, which is reflected in this solrconfig.xml entry

requestHandler name=/update/harvest
  class=solr.DirectUpdateHandler2

notice that I am calling solrDirectUpdateHandler2 directly in defining
a requestHandler

I don't do that anymore. Now, it's this:

updateRequestProcessorChain name=harvest default=true

which took a lot of fishing to sort out, because, being somewhat
dyslexic, it took a long time to figure out that I can use harvest
as a setting in SolrJ, thus:

harvestServer = new HttpSolrServer(solrURL);
harvestServer.getHttpClient().getParams().setParameter(update.chain,
harvest);

In short, the original exception was based on a gross
misinterpretation of how one goes about equating solrconfig.xml with
configurations of SolrJ.

Hope that helps more than it confuses!

Cheers
Jack

On Thu, Jun 27, 2013 at 9:45 AM, Mark Bennett
mark.benn...@lucidworks.com wrote:
 Jack,

 Did you ever find a fix for this?

 I'm having similar issues (different parts of solrconfig) and my guess is 
 it's a config issue somewhere, vs. a proper casting problem, some nested init 
 issue.

 Was curious what you found?


 On Mar 13, 2013, at 11:52 AM, Jack Park jackp...@topicquests.org wrote:

 I can safely say that it is not DirectUpdateHandler2 failing;  By
 commenting out my own handlers, the system boots without error.

 This means that my handlers are problematic in some way. The moment I
 put back just one of my handlers:

 updateRequestProcessorChain name=harvest default=true
  processor class=solr.RunUpdateProcessorFactory/
  processor
 class=org.apache.solr.update.TopicQuestsDocumentProcessFactory
str name=inputFieldhello/str
  /processor
  processor class=solr.LogUpdateProcessorFactory/
 /updateRequestProcessorChain

 requestHandler name=/update/harvest
  class=solr.DirectUpdateHandler2
   lst name=defaults
 str name=update.chainharvest/str
/lst

 /requestHandler

 The problem returns.  It simply appears that I cannot declare a named
 requestHandler using that class.

 Jack

 On Tue, Mar 12, 2013 at 12:22 PM, Jack Park jackp...@topicquests.org wrote:
 Indeed! Perhaps the germane part is this, before the failure to
 instantiate notice:

 Caused by: java.lang.ClassCastException: class 
 org.apache.solr.update.DirectUpda
 teHandler2
at java.lang.Class.asSubclass(Unknown Source)
at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.
 java:432)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:507)

 This suggests that I might be doing something wrong elsewhere in 
 solrconfig.xml.

 The possibly relevant parts (my contributions) are these:

 updateRequestProcessorChain name=partial default=true
  processor class=solr.RunUpdateProcessorFactory/
  processor class=solr.LogUpdateProcessorFactory/
 /updateRequestProcessorChain

 updateRequestProcessorChain name=harvest default=true
  processor class=solr.RunUpdateProcessorFactory/
  processor
 class=org.apache.solr.update.TopicQuestsDocumentProcessFactory
str name=inputFieldhello/str
  /processor
  processor class=solr.LogUpdateProcessorFactory/
 /updateRequestProcessorChain

 requestHandler name=/update/harvest
  class=solr.DirectUpdateHandler2
   lst name=defaults
 str name=update.chainharvest/str
/lst

 /requestHandler

 requestHandler name=/update/partial
  class=solr.DirectUpdateHandler2
   lst name=defaults
 str name=update.chainpartial/str
   /lst
 /requestHandler

 Thanks
 Jack

 On Tue, Mar 12, 2013 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote:
 There should be a stack trace - also, you shouldn't have to do anything 
 special to use this class. It's the default and only truly supported 
 implementation…

 - Mark

 On Mar 12, 2013, at 2:53 PM, Jack Park jackp...@topicquests.org wrote:

 That messages gives great, but terrible google. Zillions of hits,
 mostly filled with very long log traces, and zero messages (that I
 could find) about what to do about it.

 I switched over to using that handler since it has an update log
 specified, and that's the only place I've found how to use update log.
 But, can't boot now.

 All the jars are in place; I'm able to import that class in my code.

 Is there any news on that issue?

 Many thanks
 Jack

 FLAGS ()



state of new config format in 4.3.1

2013-06-27 Thread shikhar
Can anyone (Eric?) outline what's changing between 4.3.1 and 4.4 wrt
http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond, and what makes
the new solr.xml format usable in 4.4 but not 4.3.1?

If one didn't care about sharedLib or solr.xml persistence (the only
solr.xml changes we care about are addition of core's via the SolrCloud
API, so if that happens with core-discovery we're good) -- is there any
reason to not use the new format?


Re: solr.DirectUpdateHandler2 failed to instantiate

2013-06-27 Thread Mark Bennett
For the record, in case anybody else hits this, I think the ClassCastException 
problem had to do with which class loader first loads the class, which is a 
side affect of which directory(ies!) you put the jar file in.

I can't reproduce the problem any more, but I believe it went away when I 
removed copies of my jar from other lib directories which I had been 
experimenting with.

--
Mark Bennett / LucidWorks: Search  Big Data / mark.benn...@lucidworks.com
Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513

On Mar 13, 2013, at 11:52 AM, Jack Park jackp...@topicquests.org wrote:

 I can safely say that it is not DirectUpdateHandler2 failing;  By
 commenting out my own handlers, the system boots without error.
 
 This means that my handlers are problematic in some way. The moment I
 put back just one of my handlers:
 
 updateRequestProcessorChain name=harvest default=true
  processor class=solr.RunUpdateProcessorFactory/
  processor
 class=org.apache.solr.update.TopicQuestsDocumentProcessFactory
str name=inputFieldhello/str
  /processor
  processor class=solr.LogUpdateProcessorFactory/
 /updateRequestProcessorChain
 
 requestHandler name=/update/harvest
  class=solr.DirectUpdateHandler2
   lst name=defaults
 str name=update.chainharvest/str
/lst
 
 /requestHandler
 
 The problem returns.  It simply appears that I cannot declare a named
 requestHandler using that class.
 
 Jack
 
 On Tue, Mar 12, 2013 at 12:22 PM, Jack Park jackp...@topicquests.org wrote:
 Indeed! Perhaps the germane part is this, before the failure to
 instantiate notice:
 
 Caused by: java.lang.ClassCastException: class 
 org.apache.solr.update.DirectUpda
 teHandler2
at java.lang.Class.asSubclass(Unknown Source)
at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.
 java:432)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:507)
 
 This suggests that I might be doing something wrong elsewhere in 
 solrconfig.xml.
 
 The possibly relevant parts (my contributions) are these:
 
 updateRequestProcessorChain name=partial default=true
  processor class=solr.RunUpdateProcessorFactory/
  processor class=solr.LogUpdateProcessorFactory/
 /updateRequestProcessorChain
 
 updateRequestProcessorChain name=harvest default=true
  processor class=solr.RunUpdateProcessorFactory/
  processor
 class=org.apache.solr.update.TopicQuestsDocumentProcessFactory
str name=inputFieldhello/str
  /processor
  processor class=solr.LogUpdateProcessorFactory/
 /updateRequestProcessorChain
 
 requestHandler name=/update/harvest
  class=solr.DirectUpdateHandler2
   lst name=defaults
 str name=update.chainharvest/str
/lst
 
 /requestHandler
 
 requestHandler name=/update/partial
  class=solr.DirectUpdateHandler2
   lst name=defaults
 str name=update.chainpartial/str
   /lst
 /requestHandler
 
 Thanks
 Jack
 
 On Tue, Mar 12, 2013 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote:
 There should be a stack trace - also, you shouldn't have to do anything 
 special to use this class. It's the default and only truly supported 
 implementation…
 
 - Mark
 
 On Mar 12, 2013, at 2:53 PM, Jack Park jackp...@topicquests.org wrote:
 
 That messages gives great, but terrible google. Zillions of hits,
 mostly filled with very long log traces, and zero messages (that I
 could find) about what to do about it.
 
 I switched over to using that handler since it has an update log
 specified, and that's the only place I've found how to use update log.
 But, can't boot now.
 
 All the jars are in place; I'm able to import that class in my code.
 
 Is there any news on that issue?
 
 Many thanks
 Jack
 
 FLAGS ()



Re: Querying across multiple *identical* Collections

2013-06-27 Thread Mark Miller
http://wiki.apache.org/solr/SolrCloud#Distributed_Requests

- Mark

On Jun 27, 2013, at 2:34 PM, Otis Gospodnetic otis.gospodne...@gmail.com 
wrote:

 Hi,
 
 This search across multiple collections question has come up a few
 times recently:
 
 http://search-lucene.com/m/2Q1BE0IT4Y/subj=Search+across+multiple+collections
 http://search-lucene.com/m/5JQrXIyhQQ1/subj=Querying+multiple+collections+in+SolrCloud
 
 One important variation of this Q is - can one search across MULTIPLE
 IDENTICAL collections.
 
 The use case is that you need to index/archive a lot of data, but
 because your searches have a time range filter, instead of having 1
 massive Collection you have to search, you really want to have N
 smaller Collection, say weekly, so you can search smaller
 Collection(s).
 
 For example:
 A query that limits matches to docs from only the last 48 hours can be
 routed only to the Collection for the latest/current week.
 If the time range filter needs data from multiple Collections (e.g.
 it's for the last 10 days and we have weekly collections), then
 IDEALLY, you want to be able to send ONE request to Solr and specify 2
 Collections to search and have Solr handle calling each Collection and
 merging.
 
 Yes, in case of full-text search global IDF would ideally be used, but
 Solr is increasingly used for analytical queries and not just
 full-text queries, and one doesn't need global IDF for that.
 
 So: Can one query *multiple identical* Collections with one request
 from the client?
 If not: should I open a new JIRA issue?
 
 I see https://issues.apache.org/jira/browse/SOLR-4497 allows aliasing
 multiple Collections, which covers the use-case where you know which
 Collections might be queried.  But in some cases you don't know that
 ahead of time, so you can't prepare all the aliases.  In that case you
 wold want to be able to list all Collections to search in the request
 and that's it.
 
 Maybe this is already doable?
 
 Thanks,
 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



Re: Searching and Retrieving Information Protocol For Solr

2013-06-27 Thread Otis Gospodnetic
HTTP?

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Thu, Jun 27, 2013 at 7:40 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 There is a low level protocol that defines client–server protocol for
 searching and retrieving information from remote computer databases called
 as Z39.50. Due to Solr is a commonly used search engine (beside being a
 NoSQL database) is there any protocol for (I don't mean a low level
 protocol, z39.50 is just an example) Solr that it can integrate with other
 clients or anything else?


Re: state of new config format in 4.3.1

2013-06-27 Thread Mark Miller
There were a variety of little bugs - it will just be a bit of a land mine 
situation if you try and do it with 4.3.1.

If it ends up working for you, that's that.

- Mark

On Jun 27, 2013, at 3:22 PM, shikhar shik...@schmizz.net wrote:

 Can anyone (Eric?) outline what's changing between 4.3.1 and 4.4 wrt
 http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond, and what makes
 the new solr.xml format usable in 4.4 but not 4.3.1?
 
 If one didn't care about sharedLib or solr.xml persistence (the only
 solr.xml changes we care about are addition of core's via the SolrCloud
 API, so if that happens with core-discovery we're good) -- is there any
 reason to not use the new format?



RE: shardkey

2013-06-27 Thread Joshi, Shital
Hi,

We finally decided on using custom sharding (implicit document routing) for our 
project. We will have ~3 mil documents per shardkey.  We're maintaining 
shardkey - shardid mapping in a database table. While adding documents we 
always specify _shard_ parameter in update URL but while querying,  we don't 
specify shards parameter. We want to search across shards. 

While experimenting we found that right after hard committing (commit=true in 
update URL), at times the query didn't return documents across shards (40% of 
the time) But many times (60% of the time) it returned documents across shards. 
When queried after few hours, the query always returned documents across  
shards. Is that expected behavior? Is there a parameter to enforce querying 
across all shards? This is very important point for us to move further with 
SolrCloud. 

We're experimenting with adding a new shard and start directing all new 
documents to this new shard. Hopefully that should work.

Many Thanks! 

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Friday, June 21, 2013 8:50 PM
To: solr-user@lucene.apache.org
Subject: Re: shardkey

On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital shital.jo...@gs.com wrote:
 But now Solr stores composite id in the document id

Correct, it's the document id itself that contains everything needed
for tje compositeId router to determine the hash.

 It would only use it to calculate hash key but while storing

compositeId routing is when it makes sense to make the routing part of
the unique id so that an id is all the information needed to find the
document in the cluster.  For example customer_id!document_name.  From
your example of 20130611!test_14 it looks like you're doing time based
sharding, and one would normally not use the compositeId router for
that.

-Yonik
http://lucidworks.com


Re: Is Overlapping onDeckSearchers=2 really a problem?

2013-06-27 Thread Robert Krüger
Shawn,

On Thu, Jun 27, 2013 at 5:03 PM, Shawn Heisey s...@elyograg.org wrote:
 On 6/27/2013 5:59 AM, Robert Krüger wrote:
 sometime forcing oneself to describe a problem is the first step to a
 solution. I just realized that I also had an autocommit statement in
 my config with the exact same amount of time the seemed to be between
 the warnings.

 I removed that, because I don't think I really need it, and now the
 warnings are gone. So it seems it happened whenever my manual commits
 overlapped with an autocommit, which, of course, was more likely when
 many commits were issued in sequence.

 If all you are doing is soft commits, your transaction logs are going to
 grow out of control.

you are absolutely right. I was shooting myself in the foot with that change.

 http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

 My recommendation:

 1) Remove all commits from your indexing application.
 2) Configure autoCommit with values similar to that wiki page.
 3) Configure autoSoftCommit to happen often.

 The autoCommit must have openSearcher set to false.  For autoSoftCommit,
 include a maxTime between 1000 and 5000 (milliseconds) and leave maxDocs
 out.

I did that but without autoSoftCommit because I need control over when
the commits happen and soft-commit in my application.

Thank you so much,

Robert


Normalizing/Returning solr scores between 0 to 1

2013-06-27 Thread smanad
Hi, 
We have a need where we would want normalized scores from score ranging
between 0 to 1 rather than a free range.
I read about it @ http://wiki.apache.org/lucene-java/ScoresAsPercentages and
seems like thats not something that is recommended. 

However, is there still a way to set some config in solrconfig to make sure
scores are always between 0 to 1?
OR i will have to implement that logic in my code after I get the results
from Solr.

Any pointers will be much appreciated.
Thanks, 
-M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Normalizing-Returning-solr-scores-between-0-to-1-tp4073797.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: shardkey

2013-06-27 Thread Mark Miller
You might be seeing https://issues.apache.org/jira/browse/SOLR-4923 ?

The commit true part of the request that add documents? If so, it might be 
SOLR-4923 and you should try the commit in a request after adding the docs.

- Mark

On Jun 27, 2013, at 4:42 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Hi,
 
 We finally decided on using custom sharding (implicit document routing) for 
 our project. We will have ~3 mil documents per shardkey.  We're maintaining 
 shardkey - shardid mapping in a database table. While adding documents we 
 always specify _shard_ parameter in update URL but while querying,  we don't 
 specify shards parameter. We want to search across shards. 
 
 While experimenting we found that right after hard committing (commit=true in 
 update URL), at times the query didn't return documents across shards (40% of 
 the time) But many times (60% of the time) it returned documents across 
 shards. When queried after few hours, the query always returned documents 
 across  shards. Is that expected behavior? Is there a parameter to enforce 
 querying across all shards? This is very important point for us to move 
 further with SolrCloud. 
 
 We're experimenting with adding a new shard and start directing all new 
 documents to this new shard. Hopefully that should work.
 
 Many Thanks! 
 
 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: Friday, June 21, 2013 8:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: shardkey
 
 On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital shital.jo...@gs.com wrote:
 But now Solr stores composite id in the document id
 
 Correct, it's the document id itself that contains everything needed
 for tje compositeId router to determine the hash.
 
 It would only use it to calculate hash key but while storing
 
 compositeId routing is when it makes sense to make the routing part of
 the unique id so that an id is all the information needed to find the
 document in the cluster.  For example customer_id!document_name.  From
 your example of 20130611!test_14 it looks like you're doing time based
 sharding, and one would normally not use the compositeId router for
 that.
 
 -Yonik
 http://lucidworks.com



Re: Why there is no getter method for defaultCollection at CloudSolrServer?

2013-06-27 Thread Furkan KAMACI
I've created a JIRA and applied a patch for it:
https://issues.apache.org/jira/browse/SOLR-4973

2013/6/12 Furkan KAMACI furkankam...@gmail.com

 Ok, I will create a JIRA for it.


 2013/6/11 Mark Miller markrmil...@gmail.com


 On Jun 11, 2013, at 4:51 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:

  Why there is no getter method for defaultCollection at CloudSolrServer?

 Want to create a JIRA issue to add it?

 - Mark





full-import failed after 5 hours with Exception: ORA-01555: snapshot too old: rollback segment number with name too small ORA-22924: snapshot too old

2013-06-27 Thread srinalluri
Hello,

I am using Solr 4.3.2 and Oracle DB. The sub entity is using
CachedSqlEntityProcessor. The dataSource is having batchSize=500. The
full-import is failed with 'ORA-01555: snapshot too old: rollback segment
number  with name  too small ORA-22924: snapshot too old' Exception after
5 hours.

We already increased the undo space 4 times at the database end. Number of
records in the jan_story table is 800,000 only. Tomcat is with 4GB JVM
memory.

Following is the entity (there are other sub-entities, I didn't mention them
here. As the import failed with article_details entity. article_details is
the first sub-entity) 

entity name=par8-article-testingprod dataSource=par8_prod pk=VCMID 
preImportDeleteQuery=content_type:article AND
repository:par8qatestingprod 
query=select ID as VCMID from jan_story
entity name=article_details dataSource=par8_prod
transformer=TemplateTransformer,ClobTransformer,RegexTransformer
  query=select bb.recordid, aa.ID as DID,aa.STORY_TITLE,
aa.STORY_HEADLINE, aa.SOURCE, aa.DECK, regexp_replace(aa.body,
'\lt;p\gt;\[(pullquote|summary)\]\lt;/p\gt;|\[video [0-9]+?\]|\[youtube
.+?\]', '') as BODY, aa.PUBLISHED_DATE, aa.MODIFIED_DATE, aa.DATELINE,
aa.REPORTER_NAME, aa.TICKER_CODES,aa.ADVERTORIAL_CONTENT from jan_story
aa,mapp bb where aa.id=bb.keystring1 cacheKey=DID
cacheLookup=par8-article-testingprod.VCMID
processor=CachedSqlEntityProcessor 
field column=content_type template=article /
field column=RECORDID name=native_id /
field column=repository template=par8qatestingprod /
field column=STORY_TITLE name=title /
field column=DECK name=description clob=true /
field column=PUBLISHED_DATE name=date /
field column=MODIFIED_DATE name=last_modified_date /
field column=BODY name=body clob=true /
field column=SOURCE name=source /
field column=DATELINE name=dateline /
field column=STORY_HEADLINE name=export_headline /
  /entity
  /entity


The full-import without CachedSqlEntityProcessor is taking 7 days. That is
why I am doing all this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/full-import-failed-after-5-hours-with-Exception-ORA-01555-snapshot-too-old-rollback-segment-number-wd-tp4073822.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Normalizing/Returning solr scores between 0 to 1

2013-06-27 Thread Kevin Osborn
There is no way that I am of aware of to have Solr return between 0 and a
1. Perhaps there is some way tom implement a custom Scorer, but that is
overkill and would probably have adverse affects. Instead, just normalize
it in your results. Of course, since you read the link you included, you
realize that it is no longer really a score, but basically just a feel good
measure. And that is what we do, along with some other logic.

-Kevin


On Thu, Jun 27, 2013 at 2:25 PM, smanad sma...@gmail.com wrote:

 Hi,
 We have a need where we would want normalized scores from score ranging
 between 0 to 1 rather than a free range.
 I read about it @ http://wiki.apache.org/lucene-java/ScoresAsPercentagesand
 seems like thats not something that is recommended.

 However, is there still a way to set some config in solrconfig to make sure
 scores are always between 0 to 1?
 OR i will have to implement that logic in my code after I get the results
 from Solr.

 Any pointers will be much appreciated.
 Thanks,
 -M



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Normalizing-Returning-solr-scores-between-0-to-1-tp4073797.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677  SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]


Re: Normalizing/Returning solr scores between 0 to 1

2013-06-27 Thread Learner
Might not be useful but a work around would be to divide all scores by max
score to get scores between 0 and 1.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Normalizing-Returning-solr-scores-between-0-to-1-tp4073797p4073829.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way to build indexes using SOLRJ without SOLR instance?

2013-06-27 Thread Learner
Thanks a lot for your response. I created a multithreaded program to create
/submit the documents in batch of 100 to Embedded SOLR server but for some
reason it takes more time to index the data when compared with
ConcurrentUpdateeSOLR server. I was under assumption that embedded server
would take less time compared to http calls but not sure why it takes more
time...

Is there a way to speed up the indexing by increasing queue size
etc..(something similar to concurrent update SOLR server)?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-build-indexes-using-SOLRJ-without-SOLR-instance-tp4073383p4073509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: state of new config format in 4.3.1

2013-06-27 Thread shikhar
Thanks Mark, might give it a go, or probably just wait for 4.4 :)


On Thu, Jun 27, 2013 at 4:06 PM, Mark Miller markrmil...@gmail.com wrote:

 There were a variety of little bugs - it will just be a bit of a land mine
 situation if you try and do it with 4.3.1.

 If it ends up working for you, that's that.

 - Mark

 On Jun 27, 2013, at 3:22 PM, shikhar shik...@schmizz.net wrote:

  Can anyone (Eric?) outline what's changing between 4.3.1 and 4.4 wrt
  http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond, and what
 makes
  the new solr.xml format usable in 4.4 but not 4.3.1?
 
  If one didn't care about sharedLib or solr.xml persistence (the only
  solr.xml changes we care about are addition of core's via the SolrCloud
  API, so if that happens with core-discovery we're good) -- is there any
  reason to not use the new format?




Question on forming query when using switch parser plugin?

2013-06-27 Thread Learner
Hi,

I currently have a query as below. I am using the fq only if latlong value
(using switch plugin) is not empty else I am not using fq at all.

Whenever latlong value is empty, I just use value of $where (in q)
parameter to return the results based on location.

Now whenever latlong value is available I need to use both $where as well as
the values returned by $lallong (geo spatial search). Currently the values
first get filtered based on 'q' first and then the values are passed to fq
hence the values returned are always subset of the values returned by 'q'. 
I need 'q' to boost the score of the documents.

Can someone let me know how to return the values corresponding to fq
(without getting filtered by 'q')?

Example:

If I search for a place like Charlotte, NC (by passing the latitude and
longitude with distance of 20 miles), I get only the results belonging to
Charlotte, NC when I use the below query. I need to return all the results
based on distance. If I dont pass the latitude and longitude but rather if I
just pass Charlotte, geo spatial function won't kick in hence the results
will be just based on $where value in 'q'.

lst name=defaults
str name=q
(
  _query_:{!cust1 qf=person_name_lname_i v=$lname}^8.3 OR
  _query_:{!cust1 qf=person_name_lname_phonetic_i v=$lname}^8.6
)
(
  _query_:{!cust df='addr_location_clean_i' qs=1 v=$where}^6.2 OR
  _query_:{!cust df='addr_location_i' qs=1 v=$where}^6.2
)
/str/lst

 lst name=appends
  str name=fq{!switch case='*:*' default=$fq_bbox
v=$latlong}/str
  /lst
  lst name=invariants
  str name=fq_bbox_query_:{!bbox pt=$latlong sfield=geo
d=$dist}^0.2/str
/lst




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-forming-query-when-using-switch-parser-plugin-tp4073847.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Configuring Solr to retrieve documents?

2013-06-27 Thread Gora Mohanty
On 27 June 2013 21:13, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Hi,

 I haven't used it yet, but I believe you can do this using the
 FileDataSource feature of DataImportHandler:

 http://wiki.apache.org/solr/DataImportHandler#FileDataSource
[...]

Please see other recent threads on similar topics
in this list: A FileDataSource is probably the way
to go, along with something like the PlainTextEntityProcessor
for text files, or TikaEntityProcessor for PDF/other
rich-text documents.

Regards,
Gora