date:20130627

Re: Need Help in migrating Solr version 1.4 to 4.3

2013-06-27 Thread Shawn Heisey

On 6/26/2013 11:25 PM, Sandeep Gupta wrote:
 To have singleton design pattern for SolrServer object creation,
 I found that there are so many ways described in
 http://en.wikipedia.org/wiki/Singleton_pattern
 So which is the best one, out of 5 examples mentioned in above url, for web
 application in general practice.
 
 I am sure lots of people (in this mailing list) will have practical
 experience
 as which type of singleton pattern need to be implement for creation of
 SolrServer object.

I will admit that when I used the word singleton I honestly hadn't
looked it up to see what it really meant.  If you do use the full
meaning of singleton, you can do this in any way you want.

Perhaps a better thing to say is that you only need one SolrServer
object for each base URL (host/port/core combination).  Things are a
little bit different when it comes to SolrCloud - you can use one
CloudSolrServer object for the entire cloud, even if there are many
collections and many servers.

In my own SolrJ code, I create two HttpSolrServer objects within each of
my homegrown Core objects.  One of them is for operations against that
specific Solr core, the other is for CoreAdmin operations.

Because the URL for CoreAdmin operations is common to multiple cores, I
create a static Map with those server objects so that my Core objects
can share the SolrServer object used for CoreAdmin when they are on the
same server machine.

For the query side, if you're in a situation where you have one access
point to your Solr installation (a load balancer in front of replicating
Solr servers) and you only have one index, then you could create a
single static SolrServer object for your entire application.

Thanks,
Shawn

Is there a way to speed up my import

2013-06-27 Thread Mysurf Mail

I have a relational database model
This is the basics of my data-config.xml

entity name=MyMainEntity pk=pID query=select ... from [dbo].[TableA]
inner join TableB on ...
 entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2]
where ResourceId = '${MyMainEntity.pId}'/entity
entity name=Entity1 pk=Id2 query=SELECT [Text] Tag
from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity
entity name=LibraryItem pk=ResourceId
query=select SKU
 FROM [TableB]
INNER JOIN ...
ON ...
 INNER JOIN ...
ON ...
WHERE ... AND ...'
 /entity
/entity

Now, this takes a lot of time.
1 rows in the first query and then each other inner entities are
fetched later (around 10 rows each).

If I use a db profiler I see a the three inner entities query running over
and over (3 select sentences than again 3 select sentences over and over)
This is really not efficient.
And the import can run over 40 hrs ()
Now,
What are my options to run it faster .
1. Obviously there is an option to flat the tables to one big table - but
that will create a lot of other side effects.
I would really like to avoid that extra effort and run solr on my
production relational tables.
So far it works great out of the box and I am searching here if there
is a configuration tweak.
2. If I will flat the rows that - does the schema.xml need to be change
too? or the same fields that are multivalued will keep being multivalued.

Thanks.

Re: Is there a way to speed up my import

2013-06-27 Thread Gora Mohanty

On 27 June 2013 12:32, Mysurf Mail stammail...@gmail.com wrote:

 I have a relational database model
 This is the basics of my data-config.xml

 entity name=MyMainEntity pk=pID query=select ... from [dbo].[TableA]
 inner join TableB on ...
  entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2]
 where ResourceId = '${MyMainEntity.pId}'/entity
 entity name=Entity1 pk=Id2 query=SELECT [Text] Tag
 from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity
 entity name=LibraryItem pk=ResourceId
 query=select SKU
  FROM [TableB]
 INNER JOIN ...
 ON ...
  INNER JOIN ...
 ON ...
 WHERE ... AND ...'
  /entity
 /entity

 Now, this takes a lot of time.
 1 rows in the first query and then each other inner entities are
 fetched later (around 10 rows each).

 If I use a db profiler I see a the three inner entities query running over
 and over (3 select sentences than again 3 select sentences over and over)
 This is really not efficient.
 And the import can run over 40 hrs ()
 Now,
 What are my options to run it faster .
 1. Obviously there is an option to flat the tables to one big table - but
 that will create a lot of other side effects.
 I would really like to avoid that extra effort and run solr on my
 production relational tables.
 So far it works great out of the box and I am searching here if there
 is a configuration tweak.
 2. If I will flat the rows that - does the schema.xml need to be change
 too? or the same fields that are multivalued will keep being multivalued.

You have not shared your actual queries, so it is difficult
to tell, but my guess would be that it is the JOINs that
are the bottle-neck rather than the SELECTs. You should
start by:
1. Profile queries from the database back-end to see
which are taking the most time, and try to simplify
them.
2. Make sure that relevant database columns are indexed.
This can make a huge difference, though going overboard
 in indexing all columns might be counter-productive.
3. Use Solr DIH's CachedSqlEntityProcessor:
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
4. Measure the time that Solr indexing takes: From your
description, you seem to be guessing at it.

In general, you should not flatten the records in the
database as that is supposed to be relational data.

Regards,
Gora

Solr admin search with wildcard

2013-06-27 Thread Amit Sela

I'm looking to search (in the solr admin search screen) a certain field
for:

*youtube*

I know that leading wildcards takes a lot of resources but I'm not worried
with that

My only question is about the syntax, would this work:

field:*youtube* ?

Thanks,

I'm using Solr 3.6.2

Re: Need Help in migrating Solr version 1.4 to 4.3

2013-06-27 Thread Upayavira

I have done this - upgraded a 1.4 index to 3.x then on to 4.x. It
worked, but...

New field types have been introduced over time that facilitate new
functionality. To continue to use an upgraded index, you need to
continue using the old field types, and thus loose some of the coolness
of newer versions.

So, a re-index will set you in far better stead, if it is at all
possible.

Upayavira

On Tue, Jun 25, 2013, at 06:37 PM, Erick Erickson wrote:
 bq: I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes
 
 Solr/Lucene explicitly try to read _one_ major revision backwards.
 Solr 3.x should be able to read 1.4 indexes. Solr 4.x should be
 able to read Solr 3.x. No attempt is made to allow Solr 4.x to read
 Solr 1.4 indexes, so I wouldn't even try.
 
 Shalin's comment is best. If at all possible I'd just forget about
 reading the old index and re-index from scratch. But if you _do_
 try upgrading 1.4 - 3.x - 4.x, you probably want to optimize
 at each step. That'll (I think) rewrite all the segments in the
 current format.
 
 Good luck!
 Erick
 
 On Tue, Jun 25, 2013 at 12:59 AM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
  You must carefully go through the upgrade instructions starting from
  1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from
  3.1 to 4.0 should be given special attention.
 
  On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta gupta...@gmail.com wrote:
  Hello All,
 
  We are planning to migrate solr 1.4 to Solr 4.3 version.
  And I am seeking some help in this side.
 
  Considering Schema file change:
  By default there are lots of changes if I compare original Solr 1.4 schema
  file to Sol 4.3 schema file.
  And that is the reason we are not copying paste of schema file.
  In our Solr 1.4 schema implementation, we have some custom fields with type
  textgen and text
  So in migration of these custom fields to Solr 4.3,  should I use type of
  text_general as replacement of textgen and
  text_en as replacement of text?
  Please confirm the same.
 
  Please check the text_general definition in 4.3 against the textgen
  fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en
  and text.
 
 
  Considering Solrconfig change:
  As we didn't have lots of changes in 1.4 solrconfig file except the
  dataimport request handler.
  And therefore in migration side, we are simply modifying the Solr 4.3
  solrconfig file with his request handler.
 
  And you need to add the dataimporthandler jar into Solr's lib
  directory. DIH is not added automatically anymore.
 
 
  Considering the application development:
 
  We used all the queries as BOOLEAN type style (was not good)  I mean put
  all the parameter in query fields i.e
  *:* AND EntityName:  AND fileName:fieldValue AND .
 
  I think we should simplify our queries using other fields like df, qf 
 
 
  Probably. AND queries are best done by filter queries (fq).
 
  We also used to create Solr server object via CommonsHttpSolrServer() so I
  am planning to use now HttpSolrServer API
 
  Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in
  the javabin format so old clients using javabin won't be able to
  communicate with Solr until you upgrade both solr client and solr
  servers.
 
 
  Please let me know the suggestion for above points also what are the other
  factors I need to take care while considering the migration.
 
  There is no substitute for reading the upgrade sections in the changes.txt.
 
  I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes. You
  will most likely need to re-index your documents.
 
  You should also think about switching to SolrCloud to take advantage
  of its features.
 
  --
  Regards,
  Shalin Shekhar Mangar.

Filter queries taking a long time, even with cache disabled

2013-06-27 Thread Dotan Cohen

On a Solr 4.1 install I see that queries with use the fq parameter
take a long time (upwards of 120 seconds), both on the standard Lucene
query parser and also with edismax. I have added the {!cache=false}
localparam to the filter query, but this does not speed up the query.
Putting all the search terms in the main query returns results in
miliseconds.

Note that I am not using any wildcard queries, in each case I am
specifying the field to search and the terms to search on. Where
should I start to debug?

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Is there a way to build indexes using SOLRJ without SOLR instance?

2013-06-27 Thread Guido Medina

I'm not a hibernate fan either to be honest, but in the Java world if
you have a good model oriented design I'm sure you prefer to map it to a
DB using JPA2 for example, in our case, we use EclipseLink which for
JPA2 I find it simpler and faster than Hibernate, now, I'm not sure of
how many JPA2 implementations can be integrated to Solr/Lucene, several
years ago I developed a project nicely using Hibernate + Hibernate
Search with just Lucene (no Solr server)

In fact I have to apologize for advising Hibernate, but for some people
it might be a good start, our company uses a polyglot design where I
have Riak + EclipseLink (Objects mapped to PostgreSQL + interceptor to
Riak), and for some objects Solr, I wish it was via annotations like in
Hibernate search cause is pretty ugly to convert back and forth to json
without any automation.

All this said, I too care about performance, but sometimes we want less
code, design patterns and things to happen automatically, so Hibernate +
Hibernate Search (If that's the only capable implementation) might not
be a bad idea at all.

Guido.

On 27/06/13 03:14, Otis Gospodnetic wrote:

If hibernate search is like regular hibernate ORM I'm not sure I'd
trust it to pick the most optimal solutions...

Otis
Solr ElasticSearch Support
http://sematext.com/
On Jun 26, 2013 4:44 PM, Guido Medina guido.med...@temetra.com wrote:

Never heard of embedded Solr server, isn't better to just use lucene alone
for that purpose? Using a helper like Hibernate? Since most applications
that require indexes will have a relational DB behind the scene, it would
not be a bad idea to use a ORM combined with Lucene annotations (aka
hibernate-search)

Guido.

On 26/06/13 20:30, Alexandre Rafalovitch wrote:

Yes, it is possible by running an embedded Solr inside SolrJ process.
The nice thing is that the index is portable, so you can then access
it from the standalone Solr server later.

I have an example here:
https://github.com/arafalov/**solr-indexing-book/tree/**
master/published/solrjhttps://github.com/arafalov/solr-indexing-book/tree/master/published/solrj
, which shows SolrJ running both as a client and with an embedded
container. Notice that you will probably need more jars than you
expect for the standalone Solr to work, including a number of servlet
jars.

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn:
http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
book)

On Wed, Jun 26, 2013 at 2:59 PM, Learner bbar...@gmail.com wrote:

I currently have a SOLRJ program which I am using for indexing the data
in
SOLR. I am trying to figure out a way to build index without depending on
running instance of SOLR. I should be able to supply the solrconfig and
schema.xml to the indexing program which in turn create index files that
I
can use with any SOLR instance. Is it possible to implement this?

--
View this message in context: http://lucene.472066.n3.**
nabble.com/Is-there-a-way-to-**build-indexes-using-SOLRJ-**
without-SOLR-instance-**tp4073383.htmlhttp://lucene.472066.n3.nabble.com/Is-there-a-way-to-build-indexes-using-SOLRJ-without-SOLR-instance-tp4073383.html
Sent from the Solr - User mailing list archive at Nabble.com.

Data Import Handler and Extract Handler

2013-06-27 Thread Venter, Scott

Hi all,

I am new to SOLR. I have been working through the SOLR 4 Cookbook and my 
experiences so far have been great.

I have worked through the extraction of PDF data recipe, and the Data import 
recipe. I would now like to join these two things, i.e. I would like to do a 
data import from a Database table of users, and then somehow associate indexed 
PDF data with rows that were imported.

I have a conceptual link between rows in the database and pdf documents, but I 
don't know how to make a physical link between the two in SOLR. For example, I 
know that user x has pdf documents a, b and c. 

If I have imported my users into SOLR using Data Import Handler, how would I

1) import and associate the pdf documents using the extract mechanism, in such 
a way that there is a link between user x and the 3 pdf documents as described 
above?

2) is there a better way to join a table of users to a set of pdf documents?

Thanks in advance
Scott.
This e-mail is subject to a disclaimer, available at 
http://www.rmb.co.za/web/elements.nsf/online/disclaimer-communications.html

Re: Is there a way to speed up my import

2013-06-27 Thread Mysurf Mail

I just configured with the caching and it works mighty fast now.
Instead of unbelievable amount queries it queris only 4 times.
CPU usage has moved from the db to the solr computer but only for a very
short time.

Problem :
I dont see the multi value fields (Inner Entities) anymore
This is  my configuration

entity name=PackageVersion pk=PackageVersionId
 query=select PackageVersion.Id PackageVersionId,  from 
entity name=PackageTag pk=ResourceId
processor=CachedSqlEntityProcessor where=ResourceId =
'${PackageVersion.PackageId}'
 query=SELECT [Text] PackageTag from [dbo].[Tag]
/entity
entity name=PackageVersionTag pk=ResourceId
processor=CachedSqlEntityProcessor where=ResourceId =
PackageVersion.PackageVersionId
 query=SELECT [Text] PackageVersionTag from [dbo].[Tag]
/entity
entity name=LibraryItem pk=ResourceId
processor=CachedSqlEntityProcessor where=Asset.[PackageVersionId] =
PackageVersion.PackageVersionId
 query=select CatalogVendorPartNum SKU, LibraryItems.[Description]
SKUDescription
FROM ...
 INNER JOIN ...
ON Asset.Id = LibraryVendors.DesignProjectId
INNER JOIN ...
 ON LibraryVendors.LibraryVendorId = LibraryItems.LibraryVendorId
WHERE Asset.[AssetTypeId]=1
 /entity
/entity

Now, when I query
http://localhost:8983/solr/vaultCache/select?q=*indent=true
it returns only the main entity attriburtes.
Where are my inner entities attributes now?
Thanks a lot.







On Thu, Jun 27, 2013 at 10:15 AM, Gora Mohanty g...@mimirtech.com wrote:

 On 27 June 2013 12:32, Mysurf Mail stammail...@gmail.com wrote:
 
  I have a relational database model
  This is the basics of my data-config.xml
 
  entity name=MyMainEntity pk=pID query=select ... from
 [dbo].[TableA]
  inner join TableB on ...
   entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2]
  where ResourceId = '${MyMainEntity.pId}'/entity
  entity name=Entity1 pk=Id2 query=SELECT [Text] Tag
  from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity
  entity name=LibraryItem pk=ResourceId
  query=select SKU
   FROM [TableB]
  INNER JOIN ...
  ON ...
   INNER JOIN ...
  ON ...
  WHERE ... AND ...'
   /entity
  /entity
 
  Now, this takes a lot of time.
  1 rows in the first query and then each other inner entities are
  fetched later (around 10 rows each).
 
  If I use a db profiler I see a the three inner entities query running
 over
  and over (3 select sentences than again 3 select sentences over and over)
  This is really not efficient.
  And the import can run over 40 hrs ()
  Now,
  What are my options to run it faster .
  1. Obviously there is an option to flat the tables to one big table - but
  that will create a lot of other side effects.
  I would really like to avoid that extra effort and run solr on my
  production relational tables.
  So far it works great out of the box and I am searching here if there
  is a configuration tweak.
  2. If I will flat the rows that - does the schema.xml need to be change
  too? or the same fields that are multivalued will keep being multivalued.

 You have not shared your actual queries, so it is difficult
 to tell, but my guess would be that it is the JOINs that
 are the bottle-neck rather than the SELECTs. You should
 start by:
 1. Profile queries from the database back-end to see
 which are taking the most time, and try to simplify
 them.
 2. Make sure that relevant database columns are indexed.
 This can make a huge difference, though going overboard
  in indexing all columns might be counter-productive.
 3. Use Solr DIH's CachedSqlEntityProcessor:
 http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
 4. Measure the time that Solr indexing takes: From your
 description, you seem to be guessing at it.

 In general, you should not flatten the records in the
 database as that is supposed to be relational data.

 Regards,
 Gora

Re: Is there a way to build indexes using SOLRJ without SOLR instance?

2013-06-27 Thread Upayavira

If what you want to do is create an index that can later be used by
Solr, then create the index with Solr. Solr has constraints about how a
Lucene index is created that you would replicate and would create a huge
amount of work.

SolrJ does have the 'embedded mode' in which the Solr itself runs in the
same JVM as the client - i.e. no HTTP transport. It could be a useful
way to do off-line index creation,

I've never used it though so can't vouch for it.

Upayavira

On Wed, Jun 26, 2013, at 09:43 PM, Guido Medina wrote:
Never heard of embedded Solr server, isn't better to just use lucene
alone for that purpose? Using a helper like Hibernate? Since most
applications that require indexes will have a relational DB behind the
scene, it would not be a bad idea to use a ORM combined with Lucene
annotations (aka hibernate-search)

Guido.

On 26/06/13 20:30, Alexandre Rafalovitch wrote:
Yes, it is possible by running an embedded Solr inside SolrJ process.
The nice thing is that the index is portable, so you can then access
it from the standalone Solr server later.

I have an example here:
https://github.com/arafalov/solr-indexing-book/tree/master/published/solrj
, which shows SolrJ running both as a client and with an embedded
container. Notice that you will probably need more jars than you
expect for the standalone Solr to work, including a number of servlet
jars.

Regards,
Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
book)

On Wed, Jun 26, 2013 at 2:59 PM, Learner bbar...@gmail.com wrote:
I currently have a SOLRJ program which I am using for indexing the data in
SOLR. I am trying to figure out a way to build index without depending on
running instance of SOLR. I should be able to supply the solrconfig and
schema.xml to the indexing program which in turn create index files that I
can use with any SOLR instance. Is it possible to implement this?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-build-indexes-using-SOLRJ-without-SOLR-instance-tp4073383.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: shard failure, leader transition took 11s (seems high?)

2013-06-27 Thread Daniel Collins

On thinking about this, isn't it a potentially more serious problem,
especially in view of the NRT support which Solr now offers?

If a server crashes (hard), ZK detects this using the heartbeat, and would
remove the /live_node, which would trigger a leader election for this
shard.
But if we soft shut it down, it seems that again we have to wait for the
instance to physically die (and the live_node to disappear) before we get a
leadership election.  For all the time between Jetty shutting down and that
happening, we have no valid leader for that shard (but ZK and the rest of
the cloud think we do).

Now searches to that shard are distributed round-robin (using the standard
Solr load balancer within Solr cloud) so they will see the failed node, and
immediately retry to another replica (and presumably work).
However, updates keep going to the (now dead) leader, shouldn't that error
in SolrCmdDistributor (forwarding update to
http://xx4:10600/solr/collection1/ failed - retrying) trigger an
election?  Retrying to a node which isn't available works if it was a
transient issue, but is that the more common case?

Maybe we have a more specialized case than most, but we have very frequent
updates and want (near) real-time indexing, we are trying to minimize
latency between index and search. We currently soft-commit every 1s to do
that and we might get several hundred stories during that second, so
failing all updates for 11s in our case is a serious issue.  I know the
Cloud has returned an error code so we know the updates have failed, but at
our application level, there is nothing else we can do, surely? Solr has to
send to the leader, but the leader isn't available, so shouldn't the cloud
be handling that?



On 24 June 2013 14:58, Daniel Collins danwcoll...@gmail.com wrote:

 Thanks Mark.  Yes, I expected some finite time for the leader to take
 over, just hadn't realized/comprehended that Jetty was already shutdown by
 this point...  Yes, I suppose the container has to stop sending requests to
 the context before it can shut the context down, so that's the window where
 the individual container knows its going down, but nothing else does (yet).
  Will try to have a think about that shutdown/stop API, I suspect we'll
 need it for production (yes we can retry but we are using soft-commit to
 get a NRT as we can, so a 10s pause isn't really acceptable in our case).


 On 24 June 2013 14:46, Mark Miller markrmil...@gmail.com wrote:

 It will take a short bit of a time before a new leader takes over when a
 leader goes - that's expected - how long it takes will vary. Some things
 will do short little retries to kind of deal with this, but you are alerted
 those updates failed, so you have to deal with that as you would other
 update fails on the client side. SolrCloud favors consistency over write
 availability. That's the short part where you lose write availability.

 To get a 'clean' shutdown - eg you want to bring the machine down, it
 didn't get hit by lightening, we have to add some specific clean stop api
 you can call first - by the time jetty (or whatever container) tells Solr
 it's shutting down, it's too late to pull the node out gracefully.

 I've danced around it in the past, but have never gotten to making that
 clean shutdown/stop API.

 - Mark

 On Jun 24, 2013, at 8:25 AM, Daniel Collins danwcoll...@gmail.com
 wrote:

  Just had an odd scenario in our current Solr system (4.3.0 + SOLR-4829
  patch), 4 shards, 2 replicas (leader + 1 other) per shard spread across
 8
  machines.
 
  We sent all our updates into a single instance, and we shutdown a leader
  for maintenance, expecting it to failover to the other replica.  What I
 saw
  was that when the leader shard went down, the instance taking updates
  started seeing rejections almost instantly, yet the cluster state
 changes
  didn't occur for several seconds.  During that time, we had no valid
 leader
  for one of our shards, so we were losing updates and queries.
 
  (shard4 leader)
  07:10:33,124 - xx4 (shard 4 leader) starts coming down.
  07:10:35,885 - cluster state change is detected
  07:10:37,172 - nsrchnj4 publishes itself as down
  07:10:37,869 - second cluster state change detected
  07:10:40,202 - closing searcher
  07:10:43,447 - cluster state change (live_nodes)
 
  (instance taking updates)
  07:10:33,443 - starts seeing rejections from xx4
  07:10:35,937 - detects a cluster state change (red herring)
  07:10:37,899 - detects another cluster state change
  07:10:43,478 - detects a live_nodes change (as shard4 leader is really
 down
  now)
  07:10:44,586 - detects that shard4 has no leader anymore
 
  (x8) - new shard4 leader
 
  07:10:32,981 - last story FROMLEADER (xx4)
  07:10:35,980 - cluster state change detected (red herring)
  07:10:37,975 - another cluster state change detected
  07:10:43,868 - running election process(!)
  07:10:44,069 - nsrchnj8 becomes leader, tries to sync from nsrchnj4
 (which
  is

Re: Filter queries taking a long time, even with cache disabled

2013-06-27 Thread Upayavira

can you give an example?

On Thu, Jun 27, 2013, at 09:08 AM, Dotan Cohen wrote:
 On a Solr 4.1 install I see that queries with use the fq parameter
 take a long time (upwards of 120 seconds), both on the standard Lucene
 query parser and also with edismax. I have added the {!cache=false}
 localparam to the filter query, but this does not speed up the query.
 Putting all the search terms in the main query returns results in
 miliseconds.
 
 Note that I am not using any wildcard queries, in each case I am
 specifying the field to search and the terms to search on. Where
 should I start to debug?
 
 --
 Dotan Cohen
 
 http://gibberish.co.il
 http://what-is-what.com

Re: Solr, Shards, multi cores and (reverse proxy)

2013-06-27 Thread medley


* I have created a new RequestHandler and added the list of the shards :

...
str
name=shardslocalhost:8780/apache-solr/leg0,localhost:8780/apache-solr/leg1,localhost:8780/apache-solr/leg2,localhost:8780/apache-solr/leg3,localhost:8780/apache-solr/leg4,localhost:8780/apache-solr/leg5/str
...


* In the url, I replaced shards=... by shards.qt

It is working well.

Thanks a lot of your help.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shards-multi-cores-and-reverse-proxy-tp4072094p4073543.html
Sent from the Solr - User mailing list archive at Nabble.com.

Is Overlapping onDeckSearchers=2 really a problem?

2013-06-27 Thread Robert Krüger

Hi,

I have a desktop application where I am abusing solr as an embedded
database accessing it and I am quite happy with everything.
Performance is more than goog enough for my use case and Solr's query
capabilities match the requirements of my app quite well. However, I
have the well-known performance warnings (see subject) in the log
whenever I index a lot of documents, although I never experience any
performance problems (might be hidden, though). The properties of my
app are:

- I (soft-)commit after every indexed item because I need the changes
to be visible immediately
- The commits are serialized
- I do not have any warming queries configured

I have read the FAQ but don't see anthing that helps in my case. As I
said, I am happy with everything as it is but the warning makes me a
bit nervous (and maybe at some point my customers when their logs are
full of those warnings). What could I do to eliminate it? Can I
configure only one searcher to be used or anything like that?

Thanks for any hints,

Robert

Re: Need Help in migrating Solr version 1.4 to 4.3

2013-06-27 Thread Upayavira

As much as possible, use new configs. Take fieldType definitions from
your 4.x example dir, don't use the old ones. e.g. if you use the old
date field type, it won't be usable in various ways (e.g. in the MS()
function).

Upayavira

On Thu, Jun 27, 2013, at 11:00 AM, Sandeep Gupta wrote:
 Thanks again Shawn for your comments.
 
 I am little worried about the multi threading of web application which
 uses
 servlets.
 
 I also found one of your explanation (please confirm the same whether its
 your comment only) in
 http://lucene.472066.n3.nabble.com/Memory-problems-with-HttpSolrServer-td4060985.html
 for the question :
 http://stackoverflow.com/questions/11931179/httpsolrserver-instance-management
 
 As you said correctly that creation of SolrServer object depends on
 number
 of shards/solrcores and thereafter need to think for implementation which
 may use singleton pattern.
 
 In my web application side,  I have only one solrcore which is default
 one
 collection1 so I will create one SolrServer object for my application.
 Sure If we decide to go for Solr Cloud then also I will create one
 object.
 
 Thanks Upayavira, yes I will do the re-index. Anything you want to
 suggest
 as you did the same migration.
 
 Thanks
 Sandeep
 
 
 
 
 
 
 
 
 
 On Thu, Jun 27, 2013 at 1:33 PM, Upayavira u...@odoko.co.uk wrote:
 
  I have done this - upgraded a 1.4 index to 3.x then on to 4.x. It
  worked, but...
 
  New field types have been introduced over time that facilitate new
  functionality. To continue to use an upgraded index, you need to
  continue using the old field types, and thus loose some of the coolness
  of newer versions.
 
  So, a re-index will set you in far better stead, if it is at all
  possible.
 
  Upayavira
 
  On Tue, Jun 25, 2013, at 06:37 PM, Erick Erickson wrote:
   bq: I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes
  
   Solr/Lucene explicitly try to read _one_ major revision backwards.
   Solr 3.x should be able to read 1.4 indexes. Solr 4.x should be
   able to read Solr 3.x. No attempt is made to allow Solr 4.x to read
   Solr 1.4 indexes, so I wouldn't even try.
  
   Shalin's comment is best. If at all possible I'd just forget about
   reading the old index and re-index from scratch. But if you _do_
   try upgrading 1.4 - 3.x - 4.x, you probably want to optimize
   at each step. That'll (I think) rewrite all the segments in the
   current format.
  
   Good luck!
   Erick
  
   On Tue, Jun 25, 2013 at 12:59 AM, Shalin Shekhar Mangar
   shalinman...@gmail.com wrote:
You must carefully go through the upgrade instructions starting from
1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from
3.1 to 4.0 should be given special attention.
   
On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta gupta...@gmail.com
  wrote:
Hello All,
   
We are planning to migrate solr 1.4 to Solr 4.3 version.
And I am seeking some help in this side.
   
Considering Schema file change:
By default there are lots of changes if I compare original Solr 1.4
  schema
file to Sol 4.3 schema file.
And that is the reason we are not copying paste of schema file.
In our Solr 1.4 schema implementation, we have some custom fields
  with type
textgen and text
So in migration of these custom fields to Solr 4.3,  should I use
  type of
text_general as replacement of textgen and
text_en as replacement of text?
Please confirm the same.
   
Please check the text_general definition in 4.3 against the textgen
fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en
and text.
   
   
Considering Solrconfig change:
As we didn't have lots of changes in 1.4 solrconfig file except the
dataimport request handler.
And therefore in migration side, we are simply modifying the Solr 4.3
solrconfig file with his request handler.
   
And you need to add the dataimporthandler jar into Solr's lib
directory. DIH is not added automatically anymore.
   
   
Considering the application development:
   
We used all the queries as BOOLEAN type style (was not good)  I mean
  put
all the parameter in query fields i.e
*:* AND EntityName:  AND fileName:fieldValue AND .
   
I think we should simplify our queries using other fields like df, qf
  
   
   
Probably. AND queries are best done by filter queries (fq).
   
We also used to create Solr server object via CommonsHttpSolrServer()
  so I
am planning to use now HttpSolrServer API
   
Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in
the javabin format so old clients using javabin won't be able to
communicate with Solr until you upgrade both solr client and solr
servers.
   
   
Please let me know the suggestion for above points also what are the
  other
factors I need to take care while considering the migration.
   
There is no substitute for reading the upgrade sections

Searching and Retrieving Information Protocol For Solr

2013-06-27 Thread Furkan KAMACI

There is a low level protocol that defines client–server protocol for
searching and retrieving information from remote computer databases called
as Z39.50. Due to Solr is a commonly used search engine (beside being a
NoSQL database) is there any protocol for (I don't mean a low level
protocol, z39.50 is just an example) Solr that it can integrate with other
clients or anything else?

Re: Is there a way to speed up my import

2013-06-27 Thread Gora Mohanty

On 27 June 2013 14:12, Mysurf Mail stammail...@gmail.com wrote:
 I just configured with the caching and it works mighty fast now.
 Instead of unbelievable amount queries it queris only 4 times.
 CPU usage has moved from the db to the solr computer but only for a very
 short time.

 Problem :
 I dont see the multi value fields (Inner Entities) anymore
 This is  my configuration
[...]

Please check the syntax of your where clause against
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
Your inner entities should have clauses like
where=ResourceId=PackageVersion.PackageId.
I am also not sure why you have the strange
square brackets.

Regards,
Gora

Re: Is Overlapping onDeckSearchers=2 really a problem?

2013-06-27 Thread Robert Krüger

Hi,

On Thu, Jun 27, 2013 at 12:23 PM, Robert Krüger krue...@lesspain.de wrote:
 Hi,

 I have a desktop application where I am abusing solr as an embedded
 database accessing it and I am quite happy with everything.
 Performance is more than goog enough for my use case and Solr's query
 capabilities match the requirements of my app quite well. However, I
 have the well-known performance warnings (see subject) in the log
 whenever I index a lot of documents, although I never experience any
 performance problems (might be hidden, though). The properties of my
 app are:

 - I (soft-)commit after every indexed item because I need the changes
 to be visible immediately
 - The commits are serialized
 - I do not have any warming queries configured

 I have read the FAQ but don't see anthing that helps in my case. As I
 said, I am happy with everything as it is but the warning makes me a
 bit nervous (and maybe at some point my customers when their logs are
 full of those warnings). What could I do to eliminate it? Can I
 configure only one searcher to be used or anything like that?

 Thanks for any hints,

 Robert

sometime forcing oneself to describe a problem is the first step to a
solution. I just realized that I also had an autocommit statement in
my config with the exact same amount of time the seemed to be between
the warnings.

I removed that, because I don't think I really need it, and now the
warnings are gone. So it seems it happened whenever my manual commits
overlapped with an autocommit, which, of course, was more likely when
many commits were issued in sequence.

displaying one result per domain

2013-06-27 Thread Wojciech Kapelinski

I'm looking for a neat solution to replace default multiple results from
single domain in SERP

somepage.com/contact.html
somepage.com/aboutus.html
otherpage.net/info.html
somepage.com/directions.html  etc

with only one result per each domain [main URL by default]

somepage.com
otherpage.net
completelydifferentpage.org

Tried grouping by Carrot2 but it's not exactly what I'm looking for.

Thanks in advance.

Re: Solr admin search with wildcard

2013-06-27 Thread Jack Krupansky


No, you cannot use wildcards within a quoted term.

Tell us a little more about what your strings look like. You might want to 
consider tokenizing or using ngrams to avoid the need for wildcards.


-- Jack Krupansky

-Original Message- 
From: Amit Sela

Sent: Thursday, June 27, 2013 3:33 AM
To: solr-user@lucene.apache.org
Subject: Solr admin search with wildcard

I'm looking to search (in the solr admin search screen) a certain field
for:

*youtube*

I know that leading wildcards takes a lot of resources but I'm not worried
with that

My only question is about the syntax, would this work:

field:*youtube* ?

Thanks,

I'm using Solr 3.6.2

how to delete on column of a doc in solr

2013-06-27 Thread anurag.jain

In my solr schema there is one dynamic field.

   dynamicField name=jobs_*  type=floatindexed=true 
stored=true/
So I have one doc value,

docs: [
{
last_name: Jain,
state_name: rajasthan,
mobile_no: 234534564621,
id: 4,
jobs_6554: 6554,

},...]
Now I just want to delete one column, means jobs_6554 not the complete doc.
How it can possible in solr.

So after delete, docs will be.

docs: [
{
last_name: Jain,
state_name: rajasthan,
mobile_no: 234534564621,
id: 4
},...]



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-delete-on-column-of-a-doc-in-solr-tp4073587.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: displaying one result per domain

2013-06-27 Thread Erik Hatcher

Extract the domain (the main URL you mention) into its own indexed field and 
use field collapsing/grouping: http://wiki.apache.org/solr/FieldCollapsing

Erik

On Jun 27, 2013, at 08:18 , Wojciech Kapelinski wrote:

 I'm looking for a neat solution to replace default multiple results from
 single domain in SERP
 
 somepage.com/contact.html
 somepage.com/aboutus.html
 otherpage.net/info.html
 somepage.com/directions.html  etc
 
 with only one result per each domain [main URL by default]
 
 somepage.com
 otherpage.net
 completelydifferentpage.org
 
 Tried grouping by Carrot2 but it's not exactly what I'm looking for.
 
 Thanks in advance.

Re: Solr admin search with wildcard

2013-06-27 Thread Amit Sela

The stored and indexed string is actually a url like 
http://www.youtube.com/somethingsomething;.
It looks like removing the quotes does the job: iframe:*youtube* or am I
wrong ? For now, performance is not an issue, but accuracy is and I would
like to know for example how many URLS have iframe source leading to
YouTube for example. So query like: iframe:*youtube* with max rows 10 or
something will return in the response numFound field the total number of
pages that have a tag ifarme with a source matching *youtube, No ?


On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.comwrote:

 No, you cannot use wildcards within a quoted term.

 Tell us a little more about what your strings look like. You might want to
 consider tokenizing or using ngrams to avoid the need for wildcards.

 -- Jack Krupansky

 -Original Message- From: Amit Sela
 Sent: Thursday, June 27, 2013 3:33 AM
 To: solr-user@lucene.apache.org
 Subject: Solr admin search with wildcard


 I'm looking to search (in the solr admin search screen) a certain field
 for:

 *youtube*

 I know that leading wildcards takes a lot of resources but I'm not worried
 with that

 My only question is about the syntax, would this work:

 field:*youtube* ?

 Thanks,

 I'm using Solr 3.6.2

Re: Solr admin search with wildcard

2013-06-27 Thread Jack Krupansky

Just copyField from the string field to a text field and use standard 
tokenization, then you can search the text field for youtube or even 
something that is a component of the URL path. No wildcard required.


-- Jack Krupansky

-Original Message- 
From: Amit Sela

Sent: Thursday, June 27, 2013 8:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr admin search with wildcard

The stored and indexed string is actually a url like 
http://www.youtube.com/somethingsomething;.
It looks like removing the quotes does the job: iframe:*youtube* or am I
wrong ? For now, performance is not an issue, but accuracy is and I would
like to know for example how many URLS have iframe source leading to
YouTube for example. So query like: iframe:*youtube* with max rows 10 or
something will return in the response numFound field the total number of
pages that have a tag ifarme with a source matching *youtube, No ?


On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky 
j...@basetechnology.comwrote:



No, you cannot use wildcards within a quoted term.

Tell us a little more about what your strings look like. You might want to
consider tokenizing or using ngrams to avoid the need for wildcards.

-- Jack Krupansky

-Original Message- From: Amit Sela
Sent: Thursday, June 27, 2013 3:33 AM
To: solr-user@lucene.apache.org
Subject: Solr admin search with wildcard


I'm looking to search (in the solr admin search screen) a certain field
for:

*youtube*

I know that leading wildcards takes a lot of resources but I'm not worried
with that

My only question is about the syntax, would this work:

field:*youtube* ?

Thanks,

I'm using Solr 3.6.2

Re: how to delete on column of a doc in solr

2013-06-27 Thread Jack Krupansky


Atomic update. For example:


curl http://localhost:8983/solr/update?commit=true \
-H Content-Type: application/json -d '
[{id: text-1, text_ss: {set: null}}]'

(From the book!)

That's for one document. If you want to do that for all documents, you will 
have to iterate yourself.


But... it sounds like you have arbitrary, unknown field names (dynamic). If 
you want to delete them, you will need to know the field name. You will have 
to write a loop that reads every document, figures out the dynamic field 
name, and then you can update with atomic update.


You may want to rethink your data model.

-- Jack Krupansky

-Original Message- 
From: anurag.jain

Sent: Thursday, June 27, 2013 8:28 AM
To: solr-user@lucene.apache.org
Subject: how to delete on column of a doc in solr

In my solr schema there is one dynamic field.

  dynamicField name=jobs_*  type=floatindexed=true
stored=true/
So I have one doc value,

docs: [
{
last_name: Jain,
state_name: rajasthan,
mobile_no: 234534564621,
id: 4,
jobs_6554: 6554,

},...]
Now I just want to delete one column, means jobs_6554 not the complete doc.
How it can possible in solr.

So after delete, docs will be.

docs: [
{
last_name: Jain,
state_name: rajasthan,
mobile_no: 234534564621,
id: 4
},...]



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-delete-on-column-of-a-doc-in-solr-tp4073587.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: displaying one result per domain

2013-06-27 Thread Jack Krupansky

The URL Classify Update Processor can take a URL and split it into pieces, 
including the host name.


http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessorFactory.html

Unfortunately, the Javadoc is sparse, not even one example.

I have some examples in the book.

You can also use a regular expression tokenfilter to extract the host name 
as well.


And you can use standard Solr grouping to group by the field containing 
host name.


-- Jack Krupansky

-Original Message- 
From: Wojciech Kapelinski

Sent: Thursday, June 27, 2013 8:18 AM
To: solr-user@lucene.apache.org
Subject: displaying one result per domain

I'm looking for a neat solution to replace default multiple results from
single domain in SERP

somepage.com/contact.html
somepage.com/aboutus.html
otherpage.net/info.html
somepage.com/directions.html  etc

with only one result per each domain [main URL by default]

somepage.com
otherpage.net
completelydifferentpage.org

Tried grouping by Carrot2 but it's not exactly what I'm looking for.

Thanks in advance.

Re: Classic 4.2 master-slave replication not completing

2013-06-27 Thread Neal Ensor

Okay, I have done this (updated to 4.3.1 across master and four slaves; one
of these is my own PC for experiments, it is not being accessed by clients).

Just had a minor replication this morning, and all three slaves are stuck
again. Replication supposedly started at 8:40, ended 30 seconds later or
so (on my local PC, set up identically to the other three slaves). The
three slaves will NOT complete the roll-over to the new index. All three
index folders have a write.lock and latest files are dated 8:40am (now it
is 8:54am, with no further activity in the index folders). There exists an
index.2013062708461 (or some variation thereof) in all three slaves'
data folder.

The seemingly-relevant thread dump of a snappuller thread on each of
these slaves:

- sun.misc.Unsafe.park(Native Method)
- java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
-

java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
-

java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
-

java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
- java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
- java.util.concurrent.FutureTask.get(FutureTask.java:83)
-

org.apache.solr.handler.SnapPuller.openNewWriterAndSearcher(SnapPuller.java:631)
-
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:446)
-

org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317)
- org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:223)
-
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
-
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
- java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
-

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
-

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
-

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
-

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
-

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
- java.lang.Thread.run(Thread.java:662)

Here they sit. My local PC slave replicated very quickly, switched over
to the new generation (206) immediately. I am not sure why the three
slaves are dragging on this. If there's any configuration elements or
other details you need, please let me know. I can manually kick them by
reloading the core from the admin pages, but obviously I would like this to
be a hands-off process. Any help is greatly appreciated; this has been
bugging me for some time now.

On Mon, Jun 24, 2013 at 9:34 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:

A bunch of replication related issues were fixed in 4.2.1 so you're
better off upgrading to 4.2.1 or later (4.3.1 is the latest release).

On Mon, Jun 24, 2013 at 6:55 PM, Neal Ensor nen...@gmail.com wrote:
As a bit of background, we run a setup (coming from 3.6.1 to 4.2
relatively
recently) with a single master receiving updates with three slaves
pulling
changes in. Our index is around 5 million documents, around 26GB in size
total.

The situation I'm seeing is this: occasionally we update the master, and
replication begins on the three slaves, seems to proceed normally until
it
hits the end. At that point, it sticks; there's no messages going on
in
the logs, nothing on the admin page seems to be happening. I sit there
for
sometimes upwards of 30 minutes, seeing no further activity in the index
folder(s). After a while, I go to the core admin page and manually
reload
the core, which catches it up. It seems like the index readers /
writers
are not releasing the index otherwise? The configuration is set to
reopen;
very occasionally this situation actually fixes itself after a longish
period of time, but it seems very annoying.

I had at first suspected this to be due to our underlying shared (SAN)
storage, so we installed SSDs in all three slave machines, and moved the
entire indexes to those. It did not seem to affect this issue at all
(additionally, I didn't really see the expected performance boost, but
that's a separate issue entirely).

Any ideas? Any configuration details I might share/reconfigure? Any
suggestions are appreciated. I could also upgrade to the later 4.3+
versions, if that might help.

Thanks!

Neal Ensor
nen...@gmail.com

--
Regards,
Shalin Shekhar Mangar.

Dot operater issue.

2013-06-27 Thread Srinivasa Chegu

Hi team,

When the user enter search term as h.e.r.b.a.l  in the search textbox and 
click on search button then  SOLR search engine is not returning any  results 
found. As I can see SOLR is accepting the request parameter as h.e.r.b.a.l. 
However we have many records with the string h.e.r.b.a.l as part of the product 
name.

Look like there is an issue with dot operator in the search term.  If we enter 
search term as herbal then it is returning search results .

Our requirement is search term should be h.e.r.b.a.l then it needs to display 
results based on dot operator .

Please help us on this issue.

Regards
Srinivas


::DISCLAIMER::


The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information 
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on 
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the 
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written 
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please 
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and 
other defects.

Re: Dot operater issue.

2013-06-27 Thread Sandeep Mestry

Hi Sri,

This depends on how the fields (that hold the value) are defined and how
the query is generated.
Try running the query in solr console and use debug=true to see how the
query string is getting parsed.

If that doesn't help then could you answer following 3 questions relating
to your question.

1) field definition in schema.xml
2) solr query url
3) parser config from solrconfig.xml


Thanks,
Sandeep


On 27 June 2013 10:41, Srinivasa Chegu cheg...@hcl.com wrote:

 Hi team,

 When the user enter search term as h.e.r.b.a.l  in the search textbox
 and click on search button then  SOLR search engine is not returning any
  results found. As I can see SOLR is accepting the request parameter as
 h.e.r.b.a.l. However we have many records with the string h.e.r.b.a.l as
 part of the product name.

 Look like there is an issue with dot operator in the search term.  If we
 enter search term as herbal then it is returning search results .

 Our requirement is search term should be h.e.r.b.a.l then it needs to
 display results based on dot operator .

 Please help us on this issue.

 Regards
 Srinivas


 ::DISCLAIMER::

 

 The contents of this e-mail and any attachment(s) are confidential and
 intended for the named recipient(s) only.
 E-mail transmission is not guaranteed to be secure or error-free as
 information could be intercepted, corrupted,
 lost, destroyed, arrive late or incomplete, or may contain viruses in
 transmission. The e mail and its contents
 (with or without referred errors) shall therefore not attach any liability
 on the originator or HCL or its affiliates.
 Views or opinions, if any, presented in this email are solely those of the
 author and may not necessarily reflect the
 views or opinions of HCL or its affiliates. Any form of reproduction,
 dissemination, copying, disclosure, modification,
 distribution and / or publication of this message without the prior
 written consent of authorized representative of
 HCL is strictly prohibited. If you have received this email in error
 please delete it and notify the sender immediately.
 Before opening any email and/or attachments, please check them for viruses
 and other defects.

TermVector and Sharding issue

2013-06-27 Thread Stanislav Sandalnikov

Hello everyone,

I saw that the ticket regarding this issue is still open (
https://issues.apache.org/jira/browse/SOLR-4479). There is last comment
that suggests to reindex documents with solr 4.2. I did reindex with 4.3
version but term vector still doesn't work producing null pointer
exception.

So, does anyone had the same problem? Is there a workaround?

Re: Solr admin search with wildcard

2013-06-27 Thread Amit Sela

Forgive my ignorance but I want to  be sure, do I add copyField
source=iframe dest=text/ to solrindex-mapping.xml?
so that my solrindex-mapping.xml looks like this:
fields
field dest=content source=content/
field dest=title source=title/
field dest=iframe source=iframe/
field dest=host source=host/
field dest=segment source=segment/
field dest=boost source=boost/
field dest=digest source=digest/
field dest=tstamp source=tstamp/
field dest=id source=url/
copyField source=url dest=url/
*copyField source=iframe dest=text/ *
/fields
uniqueKeyurl/uniqueKey

And what do you mean by standard tokenization ?

Thanks!


On Thu, Jun 27, 2013 at 3:43 PM, Jack Krupansky j...@basetechnology.comwrote:

 Just copyField from the string field to a text field and use standard
 tokenization, then you can search the text field for youtube or even
 something that is a component of the URL path. No wildcard required.


 -- Jack Krupansky

 -Original Message- From: Amit Sela
 Sent: Thursday, June 27, 2013 8:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr admin search with wildcard


 The stored and indexed string is actually a url like 
 http://www.youtube.com/**somethingsomethinghttp://www.youtube.com/somethingsomething
 .
 It looks like removing the quotes does the job: iframe:*youtube* or am I
 wrong ? For now, performance is not an issue, but accuracy is and I would
 like to know for example how many URLS have iframe source leading to
 YouTube for example. So query like: iframe:*youtube* with max rows 10 or
 something will return in the response numFound field the total number of
 pages that have a tag ifarme with a source matching *youtube, No ?


 On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  No, you cannot use wildcards within a quoted term.

 Tell us a little more about what your strings look like. You might want to
 consider tokenizing or using ngrams to avoid the need for wildcards.

 -- Jack Krupansky

 -Original Message- From: Amit Sela
 Sent: Thursday, June 27, 2013 3:33 AM
 To: solr-user@lucene.apache.org
 Subject: Solr admin search with wildcard


 I'm looking to search (in the solr admin search screen) a certain field
 for:

 *youtube*

 I know that leading wildcards takes a lot of resources but I'm not worried
 with that

 My only question is about the syntax, would this work:

 field:*youtube* ?

 Thanks,

 I'm using Solr 3.6.2

Re: Data Import Handler and Extract Handler

2013-06-27 Thread Gora Mohanty

On 27 June 2013 13:42, Venter, Scott scott.ven...@rmb.co.za wrote:
 Hi all,

 I am new to SOLR. I have been working through the SOLR 4 Cookbook and my 
 experiences so far have been great.

 I have worked through the extraction of PDF data recipe, and the Data import 
 recipe. I would now like to join these two things, i.e. I would like to do a 
 data import from a Database table of users, and then somehow associate 
 indexed PDF data with rows that were imported.

 I have a conceptual link between rows in the database and pdf documents, but 
 I don't know how to make a physical link between the two in SOLR. For 
 example, I know that user x has pdf documents a, b and c.

 If I have imported my users into SOLR using Data Import Handler, how would I

 1) import and associate the pdf documents using the extract mechanism, in 
 such a way that there is a link between user x and the 3 pdf documents as 
 described above?
[...]

Where are your PDF documents? Presumably on the filesystem
or available from a web service. What you can do is to have
two datasources in your DIH configuration file:
* The first one is a JdbcDataSource that extracts data from a
   database. Presumably, you already have this working.
* The second is a BinFileDataSource assuming that your
   PDF files are on the filesystem.
* In the top-level entity, select the user and the names of the
  associated PDF files.
* Use a nested inner entity with the dataSource attribute set
  to the BinFileDataSource, and use the TikaEntityProcessor
  to index the PDF files. The documentation on this is a little
  scattered, but see:
  http://wiki.apache.org/solr/TikaEntityProcessor
  
http://lucene.472066.n3.nabble.com/problem-to-indexing-pdf-directory-td3749554.html

Regards,
Gora

Re: Filter queries taking a long time, even with cache disabled

2013-06-27 Thread Dotan Cohen

On Thu, Jun 27, 2013 at 12:14 PM, Upayavira u...@odoko.co.uk wrote:
 can you give an example?


Thank you. This is an example query:
select
?q=search_field:iraq
fq={!cache=false}search_field:love%20obama
defType=edismax

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

Re: Classic 4.2 master-slave replication not completing

2013-06-27 Thread Mark Miller

Odd - looks like it's stuck waiting to be notified that a new searcher is ready.

- Mark

On Jun 27, 2013, at 8:58 AM, Neal Ensor nen...@gmail.com wrote: