Re: broken links returned from solr search

2013-06-29 Thread gilawem
OK thanks. So I guess I will set up my own "normal" webserver and have the solr 
server a sort of private web-based API (or possibly a front-end that, when a 
user clicks on a search result link, just redirects the user to my "normal" web 
server that has the related file). That's easy enough. If that's not how solr 
is supposed to be used, please feel free to let me know. Thanks!

On Jun 29, 2013, at 3:34 PM, Erick Erickson wrote:

> There's nothing built into the indexing process that stores URLs allowing
> you to fetch the document, you have to do that yourself. I'm not sure how
> the link is getting into the search results, you're assigning "doc1" as the
> ID of the doc, and I think the browse request handler, aka Solaritas is
> constructing the link as best it can. But that is only demo code, not
> intended to fetch the document.
> 
> In a typical app, you'll construct a URL for display that has meaning in
> _your_ environment, typically some way for the app server to know where the
> document is and how to fetch it. the browse request handler is showing you
> how you'd do this, but isn't meant to actually fetch the doc.
> 
> Best
> Erick
> 
> 
> On Sat, Jun 29, 2013 at 1:29 PM, gilawem  wrote:
> 
>> Sorry, i thought it was obvious. The links that are broken are the links
>> that are returned in the search results. Using the example in the
>> documentation I mentioned below, to load a word doc via
>>curl "
>> http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F
>> "myfile=@myworddoc.doc"
>> 
>> the broken link that shows up in the search results is
>> http://localhost:8983/solr/collection1/doc1
>> 
>> so I just need to know where in the solr config to be able to handle
>> requests when the URL points to collection/some_doc
>> 
>> 
>> On Jun 29, 2013, at 1:08 PM, Erick Erickson wrote:
>> 
>>> What links? You haven't shown us what link you're clicking on
>>> that generates the 404 error.
>>> 
>>> You might want to review:
>>> http://wiki.apache.org/solr/UsingMailingLists
>>> 
>>> Best
>>> Erick
>>> 
>>> 
>>> On Fri, Jun 28, 2013 at 2:04 PM, MA LIG  wrote:
>>> 
 Hello,
 
 I ran the solr example as described in
 http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some
>> doc
 files to solr as described in
 http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I
>> used
 to load the files were of the form
 
 curl "
 http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true";
>> -F
 "myfile=@test.doc"
 
 I can successfully see search results in
 http://localhost:8983/solr/collection1/browse<
 http://192.168.3.72:8983/solr/collection1/browse?q=test>
 .
 
 However, when I click on a link, I get a 404 not found error. How can I
 make these links work properly?
 
 Thanks in advance
 
 -gw
 
>> 
>> 



Re: documentCache not used in 4.3.1?

2013-06-29 Thread Tim Vaillancourt

That's a good idea, I'll try that next week.

Thanks!

Tim

On 29/06/13 12:39 PM, Erick Erickson wrote:

Tim:

Yeah, this doesn't make much sense to me either since,
as you say, you should be seeing some metrics upon
occasion. But do note that the underlying cache only gets
filled when getting documents to return in query results,
since there's no autowarming going on it may come and
go.

But you can test this pretty quickly by lengthening your
autocommit interval or just not indexing anything
for a while, then run a bunch of queries and look at your
cache stats. That'll at least tell you whether it works at all.
You'll have to have hard commits turned off (or openSearcher
set to 'false') for that check too.

Best
Erick


On Sat, Jun 29, 2013 at 2:48 PM, Vaillancourt, Timwrote:


Yes, we are softCommit'ing every 1000ms, but that should be enough time to
see metrics though, right? For example, I still get non-cumulative metrics
from the other caches (which are also throw away). I've also curl/sampled
enough that I probably should have seen a value by now.

If anyone else can reproduce this on 4.3.1 I will feel less crazy :).

Cheers,

Tim

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, June 29, 2013 10:13 AM
To: solr-user@lucene.apache.org
Subject: Re: documentCache not used in 4.3.1?

It's especially weird that the hit ratio is so high and you're not seeing
anything in the cache. Are you perhaps soft committing frequently? Soft
commits throw away all the top-level caches including documentCache I
think

Erick


On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourt
wrote:
Thanks Otis,

Yeah I realized after sending my e-mail that doc cache does not warm,
however I'm still lost on why there are no other metrics.

Thanks!

Tim


On 28 June 2013 16:22, Otis Gospodnetic
wrote:


Hi Tim,

Not sure about the zeros in 4.3.1, but in SPM we see all these
numbers are non-0, though I haven't had the chance to confirm with

Solr 4.3.1.

Note that you can't really autowarm document cache...

Otis
--
Solr&  ElasticSearch Support -- http://sematext.com/ Performance
Monitoring -- http://sematext.com/spm



On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt

wrote:

Hey guys,

This has to be a stupid question/I must be doing something wrong,
but

after

frequent load testing with documentCache enabled under Solr 4.3.1
with autoWarmCount=150, I'm noticing that my documentCache metrics
are

always

zero for non-cumlative.

At first I thought my commit rate is fast enough I just never see
the non-cumlative result, but after 100s of samples I still always
get zero values.

Here is the current output of my documentCache from Solr's admin
for 1

core:

"

- documentCache<

http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?en
try=documentCache

   - class:org.apache.solr.search.LRUCache
   - version:1.0
   - description:LRU Cache(maxSize=512, initialSize=512,
   autowarmCount=150, regenerator=null)
   - src:$URL: https:/
   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
   solr/core/src/java/org/apache/solr/search/LRUCache.java<

https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/s
olr/core/src/java/org/apache/solr/search/LRUCache.java

$
   - stats:
  - lookups:0
  - hits:0
  - hitratio:0.00
  - inserts:0
  - evictions:0
  - size:0
  - warmupTime:0
  - cumulative_lookups:65198986
  - cumulative_hits:63075669
  - cumulative_hitratio:0.96
  - cumulative_inserts:2123317
  - cumulative_evictions:1010262
   "

The cumulative values seem to rise, suggesting doc cache is
working,

but

at

the same time it seems I never see non-cumlative metrics, most

importantly

warmupTime.

Am I doing something wrong, is this normal/by-design, or is there
an

issue

here?

Thanks for helping with my silly question! Have a good weekend,

Tim


Re: Varnish

2013-06-29 Thread Lance Norskog
Solr HTTP caching also support "e-tags". These are unique keys for the 
output of a query. If you send a query twice, and the index has not 
changed, the return will be the same. The e-tag is generated from the 
query string and the index generation number.


If Varnish supports e-tags, you can keep some queries cached longer than 
your timeout.


Lance

On 06/29/2013 05:51 PM, William Bell wrote:

On a large website, by putting 1 varnish in front of all 4 SOLR boxes we
were able to trim 25% off the load time (TTFB) of the page.

Our hit ratio was between 55 and 75%. We gave varnish 24GB of RAM, and was
not able to fill it under full load with a 10 minute cache timeout.

We get about 2.4M SOLR calls every 15 to 20 minutes.

One varnish was able to handle it with almost no lingering connections, and
load average of < 1.

Varnish is very optimized and worth trying.



On Sat, Jun 29, 2013 at 6:47 PM, William Bell  wrote:


OK.

Here is the answer for us. Here is a sample default.vcl. We are validating
the LastModified ( if (!beresp.http.last-modified) )
is changing when the core is indexed and the version changes of the index.

This does 10 minutes caching and a 1hr grace period (if solr is down, it
will deliver results up to 1 hr).

This uses the URL for caching.

You can also do:

http://localhost?PURGEME

To clear varnish if your IP is in the ACL list.


backend server1 {
 .host = "XXX.domain.com";
 .port = "8983";
 .probe = {
 .url = "/solr/pingall/select/?q=*%3A*";
 .interval = 5s;
 .timeout = 1s;
 .window = 5;
 .threshold = 3;
 }
}
backend server2{
 .host = "XXX1.domain.com";
 .port = "8983";
 .probe = {
 .url = "/solr/pingall/select/?q=*%3A*";
 .interval = 5s;
 .timeout = 1s;
 .window = 5;
 .threshold = 3;
 }
}
backend server3{
 .host = "XXX2.domain.com";
 .port = "8983";
 .probe = {
 .url = "/solr/pingall/select/?q=*%3A*";
 .interval = 5s;
 .timeout = 1s;
 .window = 5;
 .threshold = 3;
 }
}
backend server4{
 .host = "XXX3.domain.com";
 .port = "8983";
 .probe = {
 .url = "/solr/pingall/select/?q=*%3A*";
 .interval = 5s;
 .timeout = 1s;
 .window = 5;
 .threshold = 3;
 }
}

director default round-robin {
   {
 .backend = server1;
   }
   {
 .backend = server2;
   }
   {
 .backend = server3;
   }
   {
 .backend = server4;
   }
}

acl purge {
 "localhost";
 "10.0.1.0"/24;
 "10.0.3.0"/24;
}


sub vcl_recv {
if (req.url ~ "\?PURGEME$") {
 if (!client.ip ~ purge) {
 error 405 "Not allowed. " + client.ip;
 }
 ban("req.url ~ /");
 error 200 "Cached Cleared";
}
remove req.http.Cookie;
if (req.backend.healthy) {
  set req.grace = 15s;
} else {
  set req.grace = 1h;
}
return (lookup);
}

sub vcl_fetch {
   set beresp.grace = 1h;
   if (!beresp.http.last-modified) {
 set beresp.ttl = 600s;
   }
   if (beresp.ttl < 600s) {
 set beresp.ttl = 600s;
   }
   unset beresp.http.Set-Cookie;
}

sub vcl_deliver {
 if (obj.hits > 0) {
 set resp.http.X-Cache = "HIT";
 } else {
 set resp.http.X-Cache = "MISS";
 }
}

sub vcl_hash {
 hash_data(req.url);
 return (hash);
}






On Tue, Jun 25, 2013 at 4:44 PM, Learner  wrote:


Check this link..
http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html



--
View this message in context:
http://lucene.472066.n3.nabble.com/Varnish-tp4072057p4073205.html
Sent from the Solr - User mailing list archive at Nabble.com.




--
Bill Bell
billnb...@gmail.com
cell 720-256-8076








Re: Http status 503 Error in solr cloud setup

2013-06-29 Thread Lance Norskog
I do not know what causes the error. This setup will not work. You need 
one or three zookeepers. SolrCloud demands that a majority of the ZK 
servers agree. If you have two ZKs this will not work.


On 06/29/2013 05:47 AM, Sagar Chaturvedi wrote:


Hi,

I setup 2 solr instances on 2 different machines and configured 2 
zookeeper servers on these machines also. When I start solr on both 
machines and try to access the solr web-admin then I get following 
error on browser --


"Http status 503 -- server is shutting down"

When I setup a single standalone solr without zookeeper, I do not get 
this error.


Any insights ?

/Thanks and Regards,/

/Sagar Chaturvedi/

/Member Of Technical Staff /

/NEC Technologies India, Noida/

/09711931646/

DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
---




Re: Varnish

2013-06-29 Thread William Bell
On a large website, by putting 1 varnish in front of all 4 SOLR boxes we
were able to trim 25% off the load time (TTFB) of the page.

Our hit ratio was between 55 and 75%. We gave varnish 24GB of RAM, and was
not able to fill it under full load with a 10 minute cache timeout.

We get about 2.4M SOLR calls every 15 to 20 minutes.

One varnish was able to handle it with almost no lingering connections, and
load average of < 1.

Varnish is very optimized and worth trying.



On Sat, Jun 29, 2013 at 6:47 PM, William Bell  wrote:

> OK.
>
> Here is the answer for us. Here is a sample default.vcl. We are validating
> the LastModified ( if (!beresp.http.last-modified) )
> is changing when the core is indexed and the version changes of the index.
>
> This does 10 minutes caching and a 1hr grace period (if solr is down, it
> will deliver results up to 1 hr).
>
> This uses the URL for caching.
>
> You can also do:
>
> http://localhost?PURGEME
>
> To clear varnish if your IP is in the ACL list.
>
>
> backend server1 {
> .host = "XXX.domain.com";
> .port = "8983";
> .probe = {
> .url = "/solr/pingall/select/?q=*%3A*";
> .interval = 5s;
> .timeout = 1s;
> .window = 5;
> .threshold = 3;
> }
> }
> backend server2{
> .host = "XXX1.domain.com";
> .port = "8983";
> .probe = {
> .url = "/solr/pingall/select/?q=*%3A*";
> .interval = 5s;
> .timeout = 1s;
> .window = 5;
> .threshold = 3;
> }
> }
> backend server3{
> .host = "XXX2.domain.com";
> .port = "8983";
> .probe = {
> .url = "/solr/pingall/select/?q=*%3A*";
> .interval = 5s;
> .timeout = 1s;
> .window = 5;
> .threshold = 3;
> }
> }
> backend server4{
> .host = "XXX3.domain.com";
> .port = "8983";
> .probe = {
> .url = "/solr/pingall/select/?q=*%3A*";
> .interval = 5s;
> .timeout = 1s;
> .window = 5;
> .threshold = 3;
> }
> }
>
> director default round-robin {
>   {
> .backend = server1;
>   }
>   {
> .backend = server2;
>   }
>   {
> .backend = server3;
>   }
>   {
> .backend = server4;
>   }
> }
>
> acl purge {
> "localhost";
> "10.0.1.0"/24;
> "10.0.3.0"/24;
> }
>
>
> sub vcl_recv {
>if (req.url ~ "\?PURGEME$") {
> if (!client.ip ~ purge) {
> error 405 "Not allowed. " + client.ip;
> }
> ban("req.url ~ /");
> error 200 "Cached Cleared";
>}
>remove req.http.Cookie;
>if (req.backend.healthy) {
>  set req.grace = 15s;
>} else {
>  set req.grace = 1h;
>}
>return (lookup);
> }
>
> sub vcl_fetch {
>   set beresp.grace = 1h;
>   if (!beresp.http.last-modified) {
> set beresp.ttl = 600s;
>   }
>   if (beresp.ttl < 600s) {
> set beresp.ttl = 600s;
>   }
>   unset beresp.http.Set-Cookie;
> }
>
> sub vcl_deliver {
> if (obj.hits > 0) {
> set resp.http.X-Cache = "HIT";
> } else {
> set resp.http.X-Cache = "MISS";
> }
> }
>
> sub vcl_hash {
> hash_data(req.url);
> return (hash);
> }
>
>
>
>
>
>
> On Tue, Jun 25, 2013 at 4:44 PM, Learner  wrote:
>
>> Check this link..
>> http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Varnish-tp4072057p4073205.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Varnish

2013-06-29 Thread William Bell
OK.

Here is the answer for us. Here is a sample default.vcl. We are validating
the LastModified ( if (!beresp.http.last-modified) )
is changing when the core is indexed and the version changes of the index.

This does 10 minutes caching and a 1hr grace period (if solr is down, it
will deliver results up to 1 hr).

This uses the URL for caching.

You can also do:

http://localhost?PURGEME

To clear varnish if your IP is in the ACL list.


backend server1 {
.host = "XXX.domain.com";
.port = "8983";
.probe = {
.url = "/solr/pingall/select/?q=*%3A*";
.interval = 5s;
.timeout = 1s;
.window = 5;
.threshold = 3;
}
}
backend server2{
.host = "XXX1.domain.com";
.port = "8983";
.probe = {
.url = "/solr/pingall/select/?q=*%3A*";
.interval = 5s;
.timeout = 1s;
.window = 5;
.threshold = 3;
}
}
backend server3{
.host = "XXX2.domain.com";
.port = "8983";
.probe = {
.url = "/solr/pingall/select/?q=*%3A*";
.interval = 5s;
.timeout = 1s;
.window = 5;
.threshold = 3;
}
}
backend server4{
.host = "XXX3.domain.com";
.port = "8983";
.probe = {
.url = "/solr/pingall/select/?q=*%3A*";
.interval = 5s;
.timeout = 1s;
.window = 5;
.threshold = 3;
}
}

director default round-robin {
  {
.backend = server1;
  }
  {
.backend = server2;
  }
  {
.backend = server3;
  }
  {
.backend = server4;
  }
}

acl purge {
"localhost";
"10.0.1.0"/24;
"10.0.3.0"/24;
}


sub vcl_recv {
   if (req.url ~ "\?PURGEME$") {
if (!client.ip ~ purge) {
error 405 "Not allowed. " + client.ip;
}
ban("req.url ~ /");
error 200 "Cached Cleared";
   }
   remove req.http.Cookie;
   if (req.backend.healthy) {
 set req.grace = 15s;
   } else {
 set req.grace = 1h;
   }
   return (lookup);
}

sub vcl_fetch {
  set beresp.grace = 1h;
  if (!beresp.http.last-modified) {
set beresp.ttl = 600s;
  }
  if (beresp.ttl < 600s) {
set beresp.ttl = 600s;
  }
  unset beresp.http.Set-Cookie;
}

sub vcl_deliver {
if (obj.hits > 0) {
set resp.http.X-Cache = "HIT";
} else {
set resp.http.X-Cache = "MISS";
}
}

sub vcl_hash {
hash_data(req.url);
return (hash);
}






On Tue, Jun 25, 2013 at 4:44 PM, Learner  wrote:

> Check this link..
> http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Varnish-tp4072057p4073205.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Improving performance to return 2000+ documents

2013-06-29 Thread Peter Sturge
Hello Utkarsh,
This may or may not be relevant for your use-case, but the way we deal with
this scenario is to retrieve the top N documents 5,10,20or100 at a time
(user selectable). We can then page the results, changing the start
parameter to return the next set. This allows us to 'retrieve' millions of
documents - we just do it at the user's leisure, rather than make them wait
for the whole lot in one go.
This works well because users very rarely want to see ALL 2000 (or whatever
number) documents at one time - it's simply too much to take in at one time.
If your use-case involves an automated or offline procedure (e.g. running a
report or some data-mining op), then presumably it doesn't matter so much
it takes a bit longer (as long as it returns in some reasonble time).
Have you looked at doing paging on the client-side - this will hugely
speed-up your search time.
HTH
Peter



On Sat, Jun 29, 2013 at 6:17 PM, Erick Erickson wrote:

> Well, depending on how many docs get served
> from the cache the time will vary. But this is
> just ugly, if you can avoid this use-case it would
> be a Good Thing.
>
> Problem here is that each and every shard must
> assemble the list of 2,000 documents (just ID and
> sort criteria, usually score).
>
> Then the node serving the original request merges
> the sub-lists to pick the top 2,000. Then the node
> sends another request to each shard to get
> the full document. Then the node merges this
> into the full list to return to the user.
>
> Solr really isn't built for this use-case, is it actually
> a compelling situation?
>
> And having your document cache set at 1M is kinda
> high if you have very big documents.
>
> FWIW,
> Erick
>
>
> On Fri, Jun 28, 2013 at 8:44 PM, Utkarsh Sengar  >wrote:
>
> > Also, I don't see a consistent response time from solr, I ran ab again
> and
> > I get this:
> >
> > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 "
> >
> >
> http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> > "
> >
> >
> > Benchmarking x.amazonaws.com (be patient)
> > Completed 100 requests
> > Completed 200 requests
> > Completed 300 requests
> > Completed 400 requests
> > Completed 500 requests
> > Finished 500 requests
> >
> >
> > Server Software:
> > Server Hostname:   x.amazonaws.com
> > Server Port:8983
> >
> > Document Path:
> >
> >
> /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> > Document Length:1538537 bytes
> >
> > Concurrency Level:  10
> > Time taken for tests:   10.858 seconds
> > Complete requests:  500
> > Failed requests:8
> >(Connect: 0, Receive: 0, Length: 8, Exceptions: 0)
> > Write errors:   0
> > Total transferred:  769297992 bytes
> > HTML transferred:   769268492 bytes
> > Requests per second:46.05 [#/sec] (mean)
> > Time per request:   217.167 [ms] (mean)
> > Time per request:   21.717 [ms] (mean, across all concurrent
> requests)
> > Transfer rate:  69187.90 [Kbytes/sec] received
> >
> > Connection Times (ms)
> >   min  mean[+/-sd] median   max
> > Connect:00   0.3  0   2
> > Processing:   110  215  72.0190 497
> > Waiting:   91  180  70.5152 473
> > Total:112  216  72.0191 497
> >
> > Percentage of the requests served within a certain time (ms)
> >   50%191
> >   66%225
> >   75%252
> >   80%272
> >   90%319
> >   95%364
> >   98%420
> >   99%453
> >  100%497 (longest request)
> >
> >
> > Sometimes it takes a lot of time, sometimes its pretty quick.
> >
> > Thanks,
> > -Utkarsh
> >
> >
> > On Fri, Jun 28, 2013 at 5:39 PM, Utkarsh Sengar  > >wrote:
> >
> > > Hello,
> > >
> > > I have a usecase where I need to retrive top 2000 documents matching a
> > > query.
> > > What are the parameters (in query, solrconfig, schema) I shoud look at
> to
> > > improve this?
> > >
> > > I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB
> > > RAM, 8vCPU and 7GB JVM heap size.
> > >
> > > I have documentCache:
> > >> > initialSize="100"   autowarmCount="0"/>
> > >
> > > allText is a copyField.
> > >
> > > This is the result I get:
> > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 "
> > >
> >
> http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> > > "
> > >
> > > Benchmarking x.amazonaws.com (be patient)
> > > Completed 100 requests
> > > Completed 200 requests
> > > Completed 300 requests
> > > Completed 400 requests
> > > Completed 500 requests
> > > Finished 500 requests
> > >
> > >
> > > Server Software:
> > > Server Hostname:x.amazonaws.com
> > > Server Port:8983
> > >
> > > Document Path:
> > >
> >
> /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> > > Document Length:1538537 bytes
> > >
> > > Concurrency Level:  10
> > > Time taken for tests:   35.999

Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
https://issues.apache.org/jira/browse/SOLR-4978


On Sat, Jun 29, 2013 at 2:33 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Yes we need to use getTimestamp instead of getDate. Please create an issue.
>
> On Sat, Jun 29, 2013 at 11:48 PM, Bill Au  wrote:
> > So disabling convertType does provide a workaround for my problem with
> > datetime column.  But the problem still exists when convertType is
> enabled
> > because DIH is not doing the conversion correctly for a solr date field.
> >  Solr date field does have a time portion but java.sql.Date does not.  So
> > DIH should not be calling ResultSet.getDate() for a solr date field.  It
> > should really be calling ResultSet.getTimestamp() instead.  Is the fix
> this
> > simple?  Am I missing anything?
> >
> > If the fix is this simple I can submit and commit a patch to DIH.
> >
> > Bill
> >
> >
> > On Sat, Jun 29, 2013 at 12:13 PM, Bill Au  wrote:
> >
> >> Setting convertType=false does solve the datetime issue.  But there are
> >> now other columns that were working before but not working now.  Since I
> >> have already done some research into the datetime to date issue and not
> >> been able to find a solution, I think I will have to keep convertType
> set
> >> to false and deal with the other column type that are not working now.
> >>
> >> Thanks for your help.
> >>
> >> Bill
> >>
> >>
> >> On Sat, Jun 29, 2013 at 10:24 AM, Bill Au  wrote:
> >>
> >>> I just double check my config.  We are using convertType=true.  Someone
> >>> else came up with the config so I am not sure why we are using it.  I
> will
> >>> try with it set to false to see if something else will break.  Thanks
> for
> >>> pointing that out.
> >>>
> >>> This is my first time using DIH.  I really like what I have seen so
> far.
> >>>
> >>> Bill
> >>>
> >>>
> >>> On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar <
> >>> shalinman...@gmail.com> wrote:
> >>>
>  The default in JdbcDataSource is to use ResultSet.getObject which
>  returns the underlying database's type. The type specific methods in
>  ResultSet are not invoked unless you are using convertType="true".
> 
>  Is MySQL actually returning java.sql.Timestamp objects?
> 
>  On Sat, Jun 29, 2013 at 5:22 AM, Bill Au  wrote:
>  > I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
>  running
>  > into a very strange problem where data from a datetime column being
>  > imported with the right date but the time is 00:00:00.  I tried
> using
>  SQL
>  > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.
>   The
>  > raw debug response of DIH, it looks like the time porting of the
>  datetime
>  > data is already 00:00:00 in Solr jdbc query result.
>  >
>  > So I looked at the source code of DIH JdbcDataSource class.  It is
>  using
>  > java.sql.ResultSet and its getDate() method to handle date column.
>  The
>  > getDate() method returns java.sql.Date.  The java api doc for
>  java.sql.Date
>  >
>  > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
>  >
>  > states that:
>  >
>  > "To conform with the definition of SQL DATE, the millisecond values
>  wrapped
>  > by a java.sql.Date instance must be 'normalized' by setting the
> hours,
>  > minutes, seconds, and milliseconds to zero in the particular time
> zone
>  with
>  > which the instance is associated."
>  >
>  > This seems to be describing exactly my problem.  Has anyone else
> notice
>  > this problem?  Has anyone use DIH to index SQL datetime
> successfully?
>   If
>  > so can you send me the relevant portion of the DIH config?
>  >
>  > Bill
> 
> 
> 
>  --
>  Regards,
>  Shalin Shekhar Mangar.
> 
> >>>
> >>>
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: documentCache not used in 4.3.1?

2013-06-29 Thread Erick Erickson
Tim:

Yeah, this doesn't make much sense to me either since,
as you say, you should be seeing some metrics upon
occasion. But do note that the underlying cache only gets
filled when getting documents to return in query results,
since there's no autowarming going on it may come and
go.

But you can test this pretty quickly by lengthening your
autocommit interval or just not indexing anything
for a while, then run a bunch of queries and look at your
cache stats. That'll at least tell you whether it works at all.
You'll have to have hard commits turned off (or openSearcher
set to 'false') for that check too.

Best
Erick


On Sat, Jun 29, 2013 at 2:48 PM, Vaillancourt, Tim wrote:

> Yes, we are softCommit'ing every 1000ms, but that should be enough time to
> see metrics though, right? For example, I still get non-cumulative metrics
> from the other caches (which are also throw away). I've also curl/sampled
> enough that I probably should have seen a value by now.
>
> If anyone else can reproduce this on 4.3.1 I will feel less crazy :).
>
> Cheers,
>
> Tim
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, June 29, 2013 10:13 AM
> To: solr-user@lucene.apache.org
> Subject: Re: documentCache not used in 4.3.1?
>
> It's especially weird that the hit ratio is so high and you're not seeing
> anything in the cache. Are you perhaps soft committing frequently? Soft
> commits throw away all the top-level caches including documentCache I
> think
>
> Erick
>
>
> On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourt  >wrote:
>
> > Thanks Otis,
> >
> > Yeah I realized after sending my e-mail that doc cache does not warm,
> > however I'm still lost on why there are no other metrics.
> >
> > Thanks!
> >
> > Tim
> >
> >
> > On 28 June 2013 16:22, Otis Gospodnetic 
> > wrote:
> >
> > > Hi Tim,
> > >
> > > Not sure about the zeros in 4.3.1, but in SPM we see all these
> > > numbers are non-0, though I haven't had the chance to confirm with
> Solr 4.3.1.
> > >
> > > Note that you can't really autowarm document cache...
> > >
> > > Otis
> > > --
> > > Solr & ElasticSearch Support -- http://sematext.com/ Performance
> > > Monitoring -- http://sematext.com/spm
> > >
> > >
> > >
> > > On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt
> > > 
> > > wrote:
> > > > Hey guys,
> > > >
> > > > This has to be a stupid question/I must be doing something wrong,
> > > > but
> > > after
> > > > frequent load testing with documentCache enabled under Solr 4.3.1
> > > > with autoWarmCount=150, I'm noticing that my documentCache metrics
> > > > are
> > always
> > > > zero for non-cumlative.
> > > >
> > > > At first I thought my commit rate is fast enough I just never see
> > > > the non-cumlative result, but after 100s of samples I still always
> > > > get zero values.
> > > >
> > > > Here is the current output of my documentCache from Solr's admin
> > > > for 1
> > > core:
> > > >
> > > > "
> > > >
> > > >- documentCache<
> > >
> > http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?en
> > try=documentCache
> > > >
> > > >   - class:org.apache.solr.search.LRUCache
> > > >   - version:1.0
> > > >   - description:LRU Cache(maxSize=512, initialSize=512,
> > > >   autowarmCount=150, regenerator=null)
> > > >   - src:$URL: https:/
> > > >   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
> > > >   solr/core/src/java/org/apache/solr/search/LRUCache.java<
> > >
> > https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/s
> > olr/core/src/java/org/apache/solr/search/LRUCache.java
> > > >$
> > > >   - stats:
> > > >  - lookups:0
> > > >  - hits:0
> > > >  - hitratio:0.00
> > > >  - inserts:0
> > > >  - evictions:0
> > > >  - size:0
> > > >  - warmupTime:0
> > > >  - cumulative_lookups:65198986
> > > >  - cumulative_hits:63075669
> > > >  - cumulative_hitratio:0.96
> > > >  - cumulative_inserts:2123317
> > > >  - cumulative_evictions:1010262
> > > >   "
> > > >
> > > > The cumulative values seem to rise, suggesting doc cache is
> > > > working,
> > but
> > > at
> > > > the same time it seems I never see non-cumlative metrics, most
> > > importantly
> > > > warmupTime.
> > > >
> > > > Am I doing something wrong, is this normal/by-design, or is there
> > > > an
> > > issue
> > > > here?
> > > >
> > > > Thanks for helping with my silly question! Have a good weekend,
> > > >
> > > > Tim
> > >
> >
>


Re: broken links returned from solr search

2013-06-29 Thread Erick Erickson
There's nothing built into the indexing process that stores URLs allowing
you to fetch the document, you have to do that yourself. I'm not sure how
the link is getting into the search results, you're assigning "doc1" as the
ID of the doc, and I think the browse request handler, aka Solaritas is
constructing the link as best it can. But that is only demo code, not
intended to fetch the document.

In a typical app, you'll construct a URL for display that has meaning in
_your_ environment, typically some way for the app server to know where the
document is and how to fetch it. the browse request handler is showing you
how you'd do this, but isn't meant to actually fetch the doc.

Best
Erick


On Sat, Jun 29, 2013 at 1:29 PM, gilawem  wrote:

> Sorry, i thought it was obvious. The links that are broken are the links
> that are returned in the search results. Using the example in the
> documentation I mentioned below, to load a word doc via
> curl "
> http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F
> "myfile=@myworddoc.doc"
>
> the broken link that shows up in the search results is
> http://localhost:8983/solr/collection1/doc1
>
> so I just need to know where in the solr config to be able to handle
> requests when the URL points to collection/some_doc
>
>
> On Jun 29, 2013, at 1:08 PM, Erick Erickson wrote:
>
> > What links? You haven't shown us what link you're clicking on
> > that generates the 404 error.
> >
> > You might want to review:
> > http://wiki.apache.org/solr/UsingMailingLists
> >
> > Best
> > Erick
> >
> >
> > On Fri, Jun 28, 2013 at 2:04 PM, MA LIG  wrote:
> >
> >> Hello,
> >>
> >> I ran the solr example as described in
> >> http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some
> doc
> >> files to solr as described in
> >> http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I
> used
> >> to load the files were of the form
> >>
> >>  curl "
> >> http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true";
> -F
> >> "myfile=@test.doc"
> >>
> >> I can successfully see search results in
> >> http://localhost:8983/solr/collection1/browse<
> >> http://192.168.3.72:8983/solr/collection1/browse?q=test>
> >> .
> >>
> >> However, when I click on a link, I get a 404 not found error. How can I
> >> make these links work properly?
> >>
> >> Thanks in advance
> >>
> >> -gw
> >>
>
>


RE: documentCache not used in 4.3.1?

2013-06-29 Thread Vaillancourt, Tim
Yes, we are softCommit'ing every 1000ms, but that should be enough time to see 
metrics though, right? For example, I still get non-cumulative metrics from the 
other caches (which are also throw away). I've also curl/sampled enough that I 
probably should have seen a value by now.

If anyone else can reproduce this on 4.3.1 I will feel less crazy :).

Cheers,

Tim

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, June 29, 2013 10:13 AM
To: solr-user@lucene.apache.org
Subject: Re: documentCache not used in 4.3.1?

It's especially weird that the hit ratio is so high and you're not seeing 
anything in the cache. Are you perhaps soft committing frequently? Soft commits 
throw away all the top-level caches including documentCache I think

Erick


On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourt wrote:

> Thanks Otis,
>
> Yeah I realized after sending my e-mail that doc cache does not warm, 
> however I'm still lost on why there are no other metrics.
>
> Thanks!
>
> Tim
>
>
> On 28 June 2013 16:22, Otis Gospodnetic 
> wrote:
>
> > Hi Tim,
> >
> > Not sure about the zeros in 4.3.1, but in SPM we see all these 
> > numbers are non-0, though I haven't had the chance to confirm with Solr 
> > 4.3.1.
> >
> > Note that you can't really autowarm document cache...
> >
> > Otis
> > --
> > Solr & ElasticSearch Support -- http://sematext.com/ Performance 
> > Monitoring -- http://sematext.com/spm
> >
> >
> >
> > On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt 
> > 
> > wrote:
> > > Hey guys,
> > >
> > > This has to be a stupid question/I must be doing something wrong, 
> > > but
> > after
> > > frequent load testing with documentCache enabled under Solr 4.3.1 
> > > with autoWarmCount=150, I'm noticing that my documentCache metrics 
> > > are
> always
> > > zero for non-cumlative.
> > >
> > > At first I thought my commit rate is fast enough I just never see 
> > > the non-cumlative result, but after 100s of samples I still always 
> > > get zero values.
> > >
> > > Here is the current output of my documentCache from Solr's admin 
> > > for 1
> > core:
> > >
> > > "
> > >
> > >- documentCache<
> >
> http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?en
> try=documentCache
> > >
> > >   - class:org.apache.solr.search.LRUCache
> > >   - version:1.0
> > >   - description:LRU Cache(maxSize=512, initialSize=512,
> > >   autowarmCount=150, regenerator=null)
> > >   - src:$URL: https:/
> > >   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
> > >   solr/core/src/java/org/apache/solr/search/LRUCache.java<
> >
> https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/s
> olr/core/src/java/org/apache/solr/search/LRUCache.java
> > >$
> > >   - stats:
> > >  - lookups:0
> > >  - hits:0
> > >  - hitratio:0.00
> > >  - inserts:0
> > >  - evictions:0
> > >  - size:0
> > >  - warmupTime:0
> > >  - cumulative_lookups:65198986
> > >  - cumulative_hits:63075669
> > >  - cumulative_hitratio:0.96
> > >  - cumulative_inserts:2123317
> > >  - cumulative_evictions:1010262
> > >   "
> > >
> > > The cumulative values seem to rise, suggesting doc cache is 
> > > working,
> but
> > at
> > > the same time it seems I never see non-cumlative metrics, most
> > importantly
> > > warmupTime.
> > >
> > > Am I doing something wrong, is this normal/by-design, or is there 
> > > an
> > issue
> > > here?
> > >
> > > Thanks for helping with my silly question! Have a good weekend,
> > >
> > > Tim
> >
>


Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Shalin Shekhar Mangar
Yes we need to use getTimestamp instead of getDate. Please create an issue.

On Sat, Jun 29, 2013 at 11:48 PM, Bill Au  wrote:
> So disabling convertType does provide a workaround for my problem with
> datetime column.  But the problem still exists when convertType is enabled
> because DIH is not doing the conversion correctly for a solr date field.
>  Solr date field does have a time portion but java.sql.Date does not.  So
> DIH should not be calling ResultSet.getDate() for a solr date field.  It
> should really be calling ResultSet.getTimestamp() instead.  Is the fix this
> simple?  Am I missing anything?
>
> If the fix is this simple I can submit and commit a patch to DIH.
>
> Bill
>
>
> On Sat, Jun 29, 2013 at 12:13 PM, Bill Au  wrote:
>
>> Setting convertType=false does solve the datetime issue.  But there are
>> now other columns that were working before but not working now.  Since I
>> have already done some research into the datetime to date issue and not
>> been able to find a solution, I think I will have to keep convertType set
>> to false and deal with the other column type that are not working now.
>>
>> Thanks for your help.
>>
>> Bill
>>
>>
>> On Sat, Jun 29, 2013 at 10:24 AM, Bill Au  wrote:
>>
>>> I just double check my config.  We are using convertType=true.  Someone
>>> else came up with the config so I am not sure why we are using it.  I will
>>> try with it set to false to see if something else will break.  Thanks for
>>> pointing that out.
>>>
>>> This is my first time using DIH.  I really like what I have seen so far.
>>>
>>> Bill
>>>
>>>
>>> On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar <
>>> shalinman...@gmail.com> wrote:
>>>
 The default in JdbcDataSource is to use ResultSet.getObject which
 returns the underlying database's type. The type specific methods in
 ResultSet are not invoked unless you are using convertType="true".

 Is MySQL actually returning java.sql.Timestamp objects?

 On Sat, Jun 29, 2013 at 5:22 AM, Bill Au  wrote:
 > I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
 running
 > into a very strange problem where data from a datetime column being
 > imported with the right date but the time is 00:00:00.  I tried using
 SQL
 > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.
  The
 > raw debug response of DIH, it looks like the time porting of the
 datetime
 > data is already 00:00:00 in Solr jdbc query result.
 >
 > So I looked at the source code of DIH JdbcDataSource class.  It is
 using
 > java.sql.ResultSet and its getDate() method to handle date column.  The
 > getDate() method returns java.sql.Date.  The java api doc for
 java.sql.Date
 >
 > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
 >
 > states that:
 >
 > "To conform with the definition of SQL DATE, the millisecond values
 wrapped
 > by a java.sql.Date instance must be 'normalized' by setting the hours,
 > minutes, seconds, and milliseconds to zero in the particular time zone
 with
 > which the instance is associated."
 >
 > This seems to be describing exactly my problem.  Has anyone else notice
 > this problem?  Has anyone use DIH to index SQL datetime successfully?
  If
 > so can you send me the relevant portion of the DIH config?
 >
 > Bill



 --
 Regards,
 Shalin Shekhar Mangar.

>>>
>>>
>>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
So disabling convertType does provide a workaround for my problem with
datetime column.  But the problem still exists when convertType is enabled
because DIH is not doing the conversion correctly for a solr date field.
 Solr date field does have a time portion but java.sql.Date does not.  So
DIH should not be calling ResultSet.getDate() for a solr date field.  It
should really be calling ResultSet.getTimestamp() instead.  Is the fix this
simple?  Am I missing anything?

If the fix is this simple I can submit and commit a patch to DIH.

Bill


On Sat, Jun 29, 2013 at 12:13 PM, Bill Au  wrote:

> Setting convertType=false does solve the datetime issue.  But there are
> now other columns that were working before but not working now.  Since I
> have already done some research into the datetime to date issue and not
> been able to find a solution, I think I will have to keep convertType set
> to false and deal with the other column type that are not working now.
>
> Thanks for your help.
>
> Bill
>
>
> On Sat, Jun 29, 2013 at 10:24 AM, Bill Au  wrote:
>
>> I just double check my config.  We are using convertType=true.  Someone
>> else came up with the config so I am not sure why we are using it.  I will
>> try with it set to false to see if something else will break.  Thanks for
>> pointing that out.
>>
>> This is my first time using DIH.  I really like what I have seen so far.
>>
>> Bill
>>
>>
>> On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>>
>>> The default in JdbcDataSource is to use ResultSet.getObject which
>>> returns the underlying database's type. The type specific methods in
>>> ResultSet are not invoked unless you are using convertType="true".
>>>
>>> Is MySQL actually returning java.sql.Timestamp objects?
>>>
>>> On Sat, Jun 29, 2013 at 5:22 AM, Bill Au  wrote:
>>> > I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
>>> running
>>> > into a very strange problem where data from a datetime column being
>>> > imported with the right date but the time is 00:00:00.  I tried using
>>> SQL
>>> > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.
>>>  The
>>> > raw debug response of DIH, it looks like the time porting of the
>>> datetime
>>> > data is already 00:00:00 in Solr jdbc query result.
>>> >
>>> > So I looked at the source code of DIH JdbcDataSource class.  It is
>>> using
>>> > java.sql.ResultSet and its getDate() method to handle date column.  The
>>> > getDate() method returns java.sql.Date.  The java api doc for
>>> java.sql.Date
>>> >
>>> > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
>>> >
>>> > states that:
>>> >
>>> > "To conform with the definition of SQL DATE, the millisecond values
>>> wrapped
>>> > by a java.sql.Date instance must be 'normalized' by setting the hours,
>>> > minutes, seconds, and milliseconds to zero in the particular time zone
>>> with
>>> > which the instance is associated."
>>> >
>>> > This seems to be describing exactly my problem.  Has anyone else notice
>>> > this problem?  Has anyone use DIH to index SQL datetime successfully?
>>>  If
>>> > so can you send me the relevant portion of the DIH config?
>>> >
>>> > Bill
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>
>>
>


Re: broken links returned from solr search

2013-06-29 Thread gilawem
Sorry, i thought it was obvious. The links that are broken are the links that 
are returned in the search results. Using the example in the documentation I 
mentioned below, to load a word doc via
curl 
"http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F 
"myfile=@myworddoc.doc"

the broken link that shows up in the search results is 
http://localhost:8983/solr/collection1/doc1

so I just need to know where in the solr config to be able to handle requests 
when the URL points to collection/some_doc


On Jun 29, 2013, at 1:08 PM, Erick Erickson wrote:

> What links? You haven't shown us what link you're clicking on
> that generates the 404 error.
> 
> You might want to review:
> http://wiki.apache.org/solr/UsingMailingLists
> 
> Best
> Erick
> 
> 
> On Fri, Jun 28, 2013 at 2:04 PM, MA LIG  wrote:
> 
>> Hello,
>> 
>> I ran the solr example as described in
>> http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some doc
>> files to solr as described in
>> http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I used
>> to load the files were of the form
>> 
>>  curl "
>> http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F
>> "myfile=@test.doc"
>> 
>> I can successfully see search results in
>> http://localhost:8983/solr/collection1/browse<
>> http://192.168.3.72:8983/solr/collection1/browse?q=test>
>> .
>> 
>> However, when I click on a link, I get a 404 not found error. How can I
>> make these links work properly?
>> 
>> Thanks in advance
>> 
>> -gw
>> 



Re: cores sharing an instance

2013-06-29 Thread Erick Erickson
Well, the code is all in the same JVM, so there's no
reason a singleton approach wouldn't work that I
can think of. All the multithreaded caveats apply.

Best
Erick


On Fri, Jun 28, 2013 at 3:44 PM, Peyman Faratin wrote:

> Hi
>
> I have a multicore setup (in 4.3.0). Is it possible for one core to share
> an instance of its class with other cores at run time? i.e.
>
> At run time core 1 makes an instance of object O_i
>
> core 1 --> object O_i
> core 2
> ---
> core n
>
> then can core K access O_i? I know they can share properties but is it
> possible to share objects?
>
> thank you
>
>


Re: FileDataSource vs JdbcDataSouce (speed) Solr 3.5

2013-06-29 Thread Erick Erickson
Mike:

One issue is that you're forcing all the work onto the Solr
server, and single-threading to boot by using DIH. You can
consider moving to a SolrJ model where you can have
N clients sending data to Solr if you can partition the data
up amongst the N clients cleanly.

FWIW,
Erick


On Sat, Jun 29, 2013 at 8:20 AM, Ahmet Arslan  wrote:

> Hi Mike,
>
>
> You could try http://wiki.apache.org/solr/UpdateCSV
>
> And make sure you commit at the very end.
>
>
>
>
> 
>  From: Mike L. 
> To: "solr-user@lucene.apache.org" 
> Sent: Saturday, June 29, 2013 3:15 AM
> Subject: FileDataSource vs JdbcDataSouce (speed) Solr 3.5
>
>
>
> I've been working on improving index time with a JdbcDataSource DIH based
> config and found it not to be as performant as I'd hoped for, for various
> reasons, not specifically due to solr. With that said, I decided to switch
> gears a bit and test out FileDataSource setup... I assumed by eliminiating
> network latency, I should see drastic improvements in terms of import
> time..but I'm a bit surprised that this process seems to run much slower,
> at least the way I've initially coded it. (below)
>
> The below is a barebone file import that I wrote which consumes a tab
> delimited file. Nothing fancy here. The regex just seperates out the
> fields... Is there faster approach to doing this? If so, what is it?
>
> Also, what is the "recommended" approach in terms of index/importing data?
> I know thats may come across as a vague question as there are various
> options available, but which one would be considered the "standard"
> approach within a production enterprise environment.
>
>
> (below has been cleansed)
>
> 
>  
>
>processor="LineEntityProcessor"
>  url="[location_of_file]/file.csv"
>  dataSource="file"
>  transformer="RegexTransformer,TemplateTransformer">
>  
> regex="^(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)\t(.*)$"
>
> groupNames="field1,field2,field3,field4,field5,field6,field7,field8,field9,field10,field11,field12,field13,field14,field15,field16,field17,field18,field19,field10,field11,field12"
> />
>  
>
> 
>
> Thanks in advance,
> Mike
>


Re: Improving performance to return 2000+ documents

2013-06-29 Thread Erick Erickson
Well, depending on how many docs get served
from the cache the time will vary. But this is
just ugly, if you can avoid this use-case it would
be a Good Thing.

Problem here is that each and every shard must
assemble the list of 2,000 documents (just ID and
sort criteria, usually score).

Then the node serving the original request merges
the sub-lists to pick the top 2,000. Then the node
sends another request to each shard to get
the full document. Then the node merges this
into the full list to return to the user.

Solr really isn't built for this use-case, is it actually
a compelling situation?

And having your document cache set at 1M is kinda
high if you have very big documents.

FWIW,
Erick


On Fri, Jun 28, 2013 at 8:44 PM, Utkarsh Sengar wrote:

> Also, I don't see a consistent response time from solr, I ran ab again and
> I get this:
>
> ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 "
>
> http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> "
>
>
> Benchmarking x.amazonaws.com (be patient)
> Completed 100 requests
> Completed 200 requests
> Completed 300 requests
> Completed 400 requests
> Completed 500 requests
> Finished 500 requests
>
>
> Server Software:
> Server Hostname:   x.amazonaws.com
> Server Port:8983
>
> Document Path:
>
> /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> Document Length:1538537 bytes
>
> Concurrency Level:  10
> Time taken for tests:   10.858 seconds
> Complete requests:  500
> Failed requests:8
>(Connect: 0, Receive: 0, Length: 8, Exceptions: 0)
> Write errors:   0
> Total transferred:  769297992 bytes
> HTML transferred:   769268492 bytes
> Requests per second:46.05 [#/sec] (mean)
> Time per request:   217.167 [ms] (mean)
> Time per request:   21.717 [ms] (mean, across all concurrent requests)
> Transfer rate:  69187.90 [Kbytes/sec] received
>
> Connection Times (ms)
>   min  mean[+/-sd] median   max
> Connect:00   0.3  0   2
> Processing:   110  215  72.0190 497
> Waiting:   91  180  70.5152 473
> Total:112  216  72.0191 497
>
> Percentage of the requests served within a certain time (ms)
>   50%191
>   66%225
>   75%252
>   80%272
>   90%319
>   95%364
>   98%420
>   99%453
>  100%497 (longest request)
>
>
> Sometimes it takes a lot of time, sometimes its pretty quick.
>
> Thanks,
> -Utkarsh
>
>
> On Fri, Jun 28, 2013 at 5:39 PM, Utkarsh Sengar  >wrote:
>
> > Hello,
> >
> > I have a usecase where I need to retrive top 2000 documents matching a
> > query.
> > What are the parameters (in query, solrconfig, schema) I shoud look at to
> > improve this?
> >
> > I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB
> > RAM, 8vCPU and 7GB JVM heap size.
> >
> > I have documentCache:
> >> initialSize="100"   autowarmCount="0"/>
> >
> > allText is a copyField.
> >
> > This is the result I get:
> > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 "
> >
> http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> > "
> >
> > Benchmarking x.amazonaws.com (be patient)
> > Completed 100 requests
> > Completed 200 requests
> > Completed 300 requests
> > Completed 400 requests
> > Completed 500 requests
> > Finished 500 requests
> >
> >
> > Server Software:
> > Server Hostname:x.amazonaws.com
> > Server Port:8983
> >
> > Document Path:
> >
> /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json
> > Document Length:1538537 bytes
> >
> > Concurrency Level:  10
> > Time taken for tests:   35.999 seconds
> > Complete requests:  500
> > Failed requests:21
> >(Connect: 0, Receive: 0, Length: 21, Exceptions: 0)
> > Write errors:   0
> > Non-2xx responses:  2
> > Total transferred:  766221660 bytes
> > HTML transferred:   766191806 bytes
> > Requests per second:13.89 [#/sec] (mean)
> > Time per request:   719.981 [ms] (mean)
> > Time per request:   71.998 [ms] (mean, across all concurrent
> requests)
> > Transfer rate:  20785.65 [Kbytes/sec] received
> >
> > Connection Times (ms)
> >   min  mean[+/-sd] median   max
> > Connect:00   0.6  0   8
> > Processing: 9  717 2339.6199   12611
> > Waiting:9  635 2233.6164   12580
> > Total:  9  718 2339.6199   12611
> >
> > Percentage of the requests served within a certain time (ms)
> >   50%199
> >   66%236
> >   75%263
> >   80%281
> >   90%548
> >   95%838
> >   98%  12475
> >   99%  12545
> >  100%  12611 (longest request)
> >
> > --
> > Thanks,
> > -Utkarsh
> >
>
>
>
> --
> Thanks,
> -Utkarsh
>


Re: documentCache not used in 4.3.1?

2013-06-29 Thread Erick Erickson
It's especially weird that the hit ratio is so high and you're
not seeing anything in the cache. Are you perhaps soft
committing frequently? Soft commits throw away all the
top-level caches including documentCache I think

Erick


On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourt wrote:

> Thanks Otis,
>
> Yeah I realized after sending my e-mail that doc cache does not warm,
> however I'm still lost on why there are no other metrics.
>
> Thanks!
>
> Tim
>
>
> On 28 June 2013 16:22, Otis Gospodnetic 
> wrote:
>
> > Hi Tim,
> >
> > Not sure about the zeros in 4.3.1, but in SPM we see all these numbers
> > are non-0, though I haven't had the chance to confirm with Solr 4.3.1.
> >
> > Note that you can't really autowarm document cache...
> >
> > Otis
> > --
> > Solr & ElasticSearch Support -- http://sematext.com/
> > Performance Monitoring -- http://sematext.com/spm
> >
> >
> >
> > On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt 
> > wrote:
> > > Hey guys,
> > >
> > > This has to be a stupid question/I must be doing something wrong, but
> > after
> > > frequent load testing with documentCache enabled under Solr 4.3.1 with
> > > autoWarmCount=150, I'm noticing that my documentCache metrics are
> always
> > > zero for non-cumlative.
> > >
> > > At first I thought my commit rate is fast enough I just never see the
> > > non-cumlative result, but after 100s of samples I still always get zero
> > > values.
> > >
> > > Here is the current output of my documentCache from Solr's admin for 1
> > core:
> > >
> > > "
> > >
> > >- documentCache<
> >
> http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache
> > >
> > >   - class:org.apache.solr.search.LRUCache
> > >   - version:1.0
> > >   - description:LRU Cache(maxSize=512, initialSize=512,
> > >   autowarmCount=150, regenerator=null)
> > >   - src:$URL: https:/
> > >   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
> > >   solr/core/src/java/org/apache/solr/search/LRUCache.java<
> >
> https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java
> > >$
> > >   - stats:
> > >  - lookups:0
> > >  - hits:0
> > >  - hitratio:0.00
> > >  - inserts:0
> > >  - evictions:0
> > >  - size:0
> > >  - warmupTime:0
> > >  - cumulative_lookups:65198986
> > >  - cumulative_hits:63075669
> > >  - cumulative_hitratio:0.96
> > >  - cumulative_inserts:2123317
> > >  - cumulative_evictions:1010262
> > >   "
> > >
> > > The cumulative values seem to rise, suggesting doc cache is working,
> but
> > at
> > > the same time it seems I never see non-cumlative metrics, most
> > importantly
> > > warmupTime.
> > >
> > > Am I doing something wrong, is this normal/by-design, or is there an
> > issue
> > > here?
> > >
> > > Thanks for helping with my silly question! Have a good weekend,
> > >
> > > Tim
> >
>


Re: broken links returned from solr search

2013-06-29 Thread Erick Erickson
What links? You haven't shown us what link you're clicking on
that generates the 404 error.

You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick


On Fri, Jun 28, 2013 at 2:04 PM, MA LIG  wrote:

> Hello,
>
> I ran the solr example as described in
> http://lucene.apache.org/solr/4_3_1/tutorial.html and then loaded some doc
> files to solr as described in
> http://wiki.apache.org/solr/ExtractingRequestHandler. The commands I used
> to load the files were of the form
>
>   curl "
> http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; -F
> "myfile=@test.doc"
>
> I can successfully see search results in
> http://localhost:8983/solr/collection1/browse<
> http://192.168.3.72:8983/solr/collection1/browse?q=test>
> .
>
> However, when I click on a link, I get a 404 not found error. How can I
> make these links work properly?
>
> Thanks in advance
>
> -gw
>


Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
Setting convertType=false does solve the datetime issue.  But there are now
other columns that were working before but not working now.  Since I have
already done some research into the datetime to date issue and not been
able to find a solution, I think I will have to keep convertType set to
false and deal with the other column type that are not working now.

Thanks for your help.

Bill


On Sat, Jun 29, 2013 at 10:24 AM, Bill Au  wrote:

> I just double check my config.  We are using convertType=true.  Someone
> else came up with the config so I am not sure why we are using it.  I will
> try with it set to false to see if something else will break.  Thanks for
> pointing that out.
>
> This is my first time using DIH.  I really like what I have seen so far.
>
> Bill
>
>
> On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> The default in JdbcDataSource is to use ResultSet.getObject which
>> returns the underlying database's type. The type specific methods in
>> ResultSet are not invoked unless you are using convertType="true".
>>
>> Is MySQL actually returning java.sql.Timestamp objects?
>>
>> On Sat, Jun 29, 2013 at 5:22 AM, Bill Au  wrote:
>> > I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
>> running
>> > into a very strange problem where data from a datetime column being
>> > imported with the right date but the time is 00:00:00.  I tried using
>> SQL
>> > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.  The
>> > raw debug response of DIH, it looks like the time porting of the
>> datetime
>> > data is already 00:00:00 in Solr jdbc query result.
>> >
>> > So I looked at the source code of DIH JdbcDataSource class.  It is using
>> > java.sql.ResultSet and its getDate() method to handle date column.  The
>> > getDate() method returns java.sql.Date.  The java api doc for
>> java.sql.Date
>> >
>> > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
>> >
>> > states that:
>> >
>> > "To conform with the definition of SQL DATE, the millisecond values
>> wrapped
>> > by a java.sql.Date instance must be 'normalized' by setting the hours,
>> > minutes, seconds, and milliseconds to zero in the particular time zone
>> with
>> > which the instance is associated."
>> >
>> > This seems to be describing exactly my problem.  Has anyone else notice
>> > this problem?  Has anyone use DIH to index SQL datetime successfully?
>>  If
>> > so can you send me the relevant portion of the DIH config?
>> >
>> > Bill
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


Re: cores sharing an instance

2013-06-29 Thread Roman Chyla
Cores can be reloaded, they are inside solrcore loader /I forgot the exact
name/, and they will have different classloaders /that's servlet thing/, so
if you want singletons you must load them outside of the core, using a
parent classloader - in case of jetty, this means writing your own jetty
initialization or config to force shared class loaders. or find a place
inside the solr, before the core is created. Google for montysolr to see
the example of the first approach.

But, unless you really have no other choice, using singletons is IMHO a bad
idea in this case

Roman

On 29 Jun 2013 10:18, "Peyman Faratin"  wrote:
>
> its the singleton pattern, where in my case i want an object (which is
RAM expensive) to be a centralized coordinator of application logic.
>
> thank you
>
> On Jun 29, 2013, at 1:16 AM, Shalin Shekhar Mangar 
wrote:
>
> > There is very little shared between multiple cores (instanceDir paths,
> > logging config maybe?). Why are you trying to do this?
> >
> > On Sat, Jun 29, 2013 at 1:14 AM, Peyman Faratin 
wrote:
> >> Hi
> >>
> >> I have a multicore setup (in 4.3.0). Is it possible for one core to
share an instance of its class with other cores at run time? i.e.
> >>
> >> At run time core 1 makes an instance of object O_i
> >>
> >> core 1 --> object O_i
> >> core 2
> >> ---
> >> core n
> >>
> >> then can core K access O_i? I know they can share properties but is it
possible to share objects?
> >>
> >> thank you
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>


Re: Solr 4.3.0 DIH problem with MySQL datetime being imported with time as 00:00:00

2013-06-29 Thread Bill Au
I just double check my config.  We are using convertType=true.  Someone
else came up with the config so I am not sure why we are using it.  I will
try with it set to false to see if something else will break.  Thanks for
pointing that out.

This is my first time using DIH.  I really like what I have seen so far.

Bill


On Sat, Jun 29, 2013 at 1:45 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> The default in JdbcDataSource is to use ResultSet.getObject which
> returns the underlying database's type. The type specific methods in
> ResultSet are not invoked unless you are using convertType="true".
>
> Is MySQL actually returning java.sql.Timestamp objects?
>
> On Sat, Jun 29, 2013 at 5:22 AM, Bill Au  wrote:
> > I am running Solr 4.3.0, using DIH to import data from MySQL.  I am
> running
> > into a very strange problem where data from a datetime column being
> > imported with the right date but the time is 00:00:00.  I tried using SQL
> > DATE_FORMAT() and also DIH DateFormatTransformer but nothing works.  The
> > raw debug response of DIH, it looks like the time porting of the datetime
> > data is already 00:00:00 in Solr jdbc query result.
> >
> > So I looked at the source code of DIH JdbcDataSource class.  It is using
> > java.sql.ResultSet and its getDate() method to handle date column.  The
> > getDate() method returns java.sql.Date.  The java api doc for
> java.sql.Date
> >
> > http://docs.oracle.com/javase/6/docs/api/java/sql/Date.html
> >
> > states that:
> >
> > "To conform with the definition of SQL DATE, the millisecond values
> wrapped
> > by a java.sql.Date instance must be 'normalized' by setting the hours,
> > minutes, seconds, and milliseconds to zero in the particular time zone
> with
> > which the instance is associated."
> >
> > This seems to be describing exactly my problem.  Has anyone else notice
> > this problem?  Has anyone use DIH to index SQL datetime successfully?  If
> > so can you send me the relevant portion of the DIH config?
> >
> > Bill
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: cores sharing an instance

2013-06-29 Thread Peyman Faratin
its the singleton pattern, where in my case i want an object (which is RAM 
expensive) to be a centralized coordinator of application logic. 

thank you

On Jun 29, 2013, at 1:16 AM, Shalin Shekhar Mangar  
wrote:

> There is very little shared between multiple cores (instanceDir paths,
> logging config maybe?). Why are you trying to do this?
> 
> On Sat, Jun 29, 2013 at 1:14 AM, Peyman Faratin  
> wrote:
>> Hi
>> 
>> I have a multicore setup (in 4.3.0). Is it possible for one core to share an 
>> instance of its class with other cores at run time? i.e.
>> 
>> At run time core 1 makes an instance of object O_i
>> 
>> core 1 --> object O_i
>> core 2
>> ---
>> core n
>> 
>> then can core K access O_i? I know they can share properties but is it 
>> possible to share objects?
>> 
>> thank you
>> 
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.



Re: increase search score of certain category only for certain keyword

2013-06-29 Thread Jack Krupansky

Use the edismax query parser with a higher boost for category than name:

   qf=name category^10.0

Tune the boost as needed for your app.

Make sure name and category have both "text" and "string" variants - use 
. The string variant is good for facets, the text variant is good 
for keyword search. Use the text variant in qf.


-- Jack Krupansky

-Original Message- 
From: winsu

Sent: Friday, June 28, 2013 9:26 PM
To: solr-user@lucene.apache.org
Subject: increase search score of certain category only for certain keyword

Hi,

Currently i've certain sample data :
name : summer boot
category : boot shoe

name  : snow boot
category : boot shoe

name : boot pant
category : pants

name : modern boot pant
category : pants

name : modern bootcut
category : pants


If the keyword search "boot" , how to make the item with category "shoe" has
higher rank than "pants" ?

can we setting at Solr to tell solr for certain keyword we need to give
"boot shoe" higher rank than other category ?
Thx :)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/increase-search-score-of-certain-category-only-for-certain-keyword-tp4074051.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Schema design for parent child field

2013-06-29 Thread Jack Krupansky
Both dynamic fields and multivalued fields are powerful Solr features that 
can be used to great effect, but only is used in moderation - a relatively 
small number of discrete values (e.g., a few dozens of strings.) Anything 
more complex and you are asking for trouble and creating a pseudo-schema 
that will be difficult to maintain or for anybody else to comprehend.


So, the simple answer to your question: Flatten, in the most straightforward 
manner - each instance of a "record type" should be a discrete Solr 
document, give each "record" its own "id" to be the Solr document key/ID. 
Solr can support multiple document types in the same collection, or you can 
store each record type in separate collection.


The simplest, cleanest structure is to store each record type in a separate 
collection and then use multiple Solr queries to emulate SQL join operations 
as needed.


But if you would prefer to "mash" multiple record types into the same Solr 
collection/schema, you can do that too. Make the schema be the union of the 
schemas for each record type - Solr/Lucene has no significant overhead for 
fields which do not have values present for a given document.


Each document would have a unique ID field. In addition, each document would 
have a parent field for each record type, so you can quickly search for all 
children of a given parent. You can have one common parent ID if you assign 
unique IDs to all children across all record types, but it can sometimes be 
cleaner for the child ID to reset to zero/one for each new parent. It's 
merely a question of whether you want to have a single key value or a tuple 
of key values to identify a specific child.


You can duplicate a subset of the parent fields in each child to simulate 
the effect of a simple join in a single clean query. But you can do a 
separate query to get parent record details.


-- Jack Krupansky

-Original Message- 
From: Sperrink

Sent: Saturday, June 29, 2013 5:08 AM
To: solr-user@lucene.apache.org
Subject: Schema design for parent child field

Good day,
I'm seeking some guidance on how best to represent the following data within
a solr schema.
I have a list of subjects which are detailed to n levels.
Each document can contain many of these subject entities.
As I see it if this had been just 1 subject per document, dynamic fields
would have been a good resolution.
Any suggestions on how best to create this structure in a denormalised
fashion while maintaining the data integrity.
For example a document could have:
Subject level 1: contract
Subject level 2: claims
Subject level 1: patent
Subject level 2: counter claims

If I were to search for level 1 contract, I would only want the facet count
for level 2 to contain claims and not counter claims.

Any assistance in this would be much appreciated.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-design-for-parent-child-field-tp4074084.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Http status 503 Error in solr cloud setup

2013-06-29 Thread Sagar Chaturvedi
Hi,

I setup 2 solr instances on 2 different machines and configured 2 zookeeper 
servers on these machines also. When I start solr on both machines and try to 
access the solr web-admin then I get following error on browser -
"Http status 503 - server is shutting down"

When I setup a single standalone solr without zookeeper, I do not get this 
error.

Any insights ?

Thanks and Regards,
Sagar Chaturvedi
Member Of Technical Staff
NEC Technologies India, Noida
[cid:image001.jpg@01CE74F4.F9A4EA60]09711931646
[cid:image002.jpg@01CE74F4.F9A4EA60]





DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---

Re: FileDataSource vs JdbcDataSouce (speed) Solr 3.5

2013-06-29 Thread Ahmet Arslan
Hi Mike,


You could try http://wiki.apache.org/solr/UpdateCSV 

And make sure you commit at the very end.





 From: Mike L. 
To: "solr-user@lucene.apache.org"  
Sent: Saturday, June 29, 2013 3:15 AM
Subject: FileDataSource vs JdbcDataSouce (speed) Solr 3.5
 

 
I've been working on improving index time with a JdbcDataSource DIH based 
config and found it not to be as performant as I'd hoped for, for various 
reasons, not specifically due to solr. With that said, I decided to switch 
gears a bit and test out FileDataSource setup... I assumed by eliminiating 
network latency, I should see drastic improvements in terms of import time..but 
I'm a bit surprised that this process seems to run much slower, at least the 
way I've initially coded it. (below)
 
The below is a barebone file import that I wrote which consumes a tab delimited 
file. Nothing fancy here. The regex just seperates out the fields... Is there 
faster approach to doing this? If so, what is it?
 
Also, what is the "recommended" approach in terms of index/importing data? I 
know thats may come across as a vague question as there are various options 
available, but which one would be considered the "standard" approach within a 
production enterprise environment.
 
 
(below has been cleansed)
 

 
   
 
 
 
   

 
Thanks in advance,
Mike

increase search score of certain category only for certain keyword

2013-06-29 Thread winsu
Hi,

Currently i've certain sample data :
name : summer boot
category : boot shoe

name  : snow boot
category : boot shoe

name : boot pant
category : pants

name : modern boot pant
category : pants

name : modern bootcut
category : pants


If the keyword search "boot" , how to make the item with category "shoe" has
higher rank than "pants" ? 

can we setting at Solr to tell solr for certain keyword we need to give
"boot shoe" higher rank than other category ?
Thx :)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/increase-search-score-of-certain-category-only-for-certain-keyword-tp4074051.html
Sent from the Solr - User mailing list archive at Nabble.com.


Schema design for parent child field

2013-06-29 Thread Sperrink
Good day,
I'm seeking some guidance on how best to represent the following data within
a solr schema.
I have a list of subjects which are detailed to n levels.
Each document can contain many of these subject entities.
As I see it if this had been just 1 subject per document, dynamic fields
would have been a good resolution.
Any suggestions on how best to create this structure in a denormalised
fashion while maintaining the data integrity.
For example a document could have:
Subject level 1: contract
Subject level 2: claims
Subject level 1: patent
Subject level 2: counter claims

If I were to search for level 1 contract, I would only want the facet count
for level 2 to contain claims and not counter claims.

Any assistance in this would be much appreciated.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-design-for-parent-child-field-tp4074084.html
Sent from the Solr - User mailing list archive at Nabble.com.