[
https://issues.apache.org/jira/browse/NUTCH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865226#action_12865226
]
Enis Soztutar commented on NUTCH-811:
-
Hi Piet,
The code for Gora will reside in GitHub
[
https://issues.apache.org/jira/browse/NUTCH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861010#action_12861010
]
Enis Soztutar commented on NUTCH-811:
-
I have further developed the code, which was once
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar closed NUTCH-808.
---
Resolution: Fixed
We have decided to go on with implementing an ORM layer as per the discussion
on NU
[
https://issues.apache.org/jira/browse/NUTCH-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856546#action_12856546
]
Enis Soztutar commented on NUTCH-811:
-
Actually, we plan to develop the code for this la
Develop an ORM framework
-
Key: NUTCH-811
URL: https://issues.apache.org/jira/browse/NUTCH-811
Project: Nutch
Issue Type: New Feature
Reporter: Enis Soztutar
Assignee: Enis Soztutar
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856360#action_12856360
]
Enis Soztutar commented on NUTCH-808:
-
bq. What do you mean by current implementation? N
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856124#action_12856124
]
Enis Soztutar commented on NUTCH-808:
-
So, this is the results so far :
DataNucleus wa
[
https://issues.apache.org/jira/browse/NUTCH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852840#action_12852840
]
Enis Soztutar commented on NUTCH-808:
-
A candidate framework is DataNucleus. It has the
Evaluate ORM Frameworks which support non-relational column-oriented datastores
and RDBMs
--
Key: NUTCH-808
URL: https://issues.apache.org/jira/browse/NUTCH-808
[
https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637489#action_12637489
]
Enis Soztutar commented on NUTCH-442:
-
I personally believe this patch should be in befo
[
https://issues.apache.org/jira/browse/NUTCH-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar resolved NUTCH-588.
-
Resolution: Invalid
Jira is not for asking questions. You should ask your questions on nutch-user
[
https://issues.apache.org/jira/browse/NUTCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-586:
Attachment: run-core_v2.patch
bq. I think you also need to put a comment, which clarifies that this
[
https://issues.apache.org/jira/browse/NUTCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548198
]
Enis Soztutar commented on NUTCH-586:
-
Can someone review this ?
> Add option to run compiled classes w/o job fil
[
https://issues.apache.org/jira/browse/NUTCH-586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-586:
Attachment: run-core_v1.patch
Attached file adds -core option to bin/nutch.
> Add option to run co
Add option to run compiled classes w/o job file
---
Key: NUTCH-586
URL: https://issues.apache.org/jira/browse/NUTCH-586
Project: Nutch
Issue Type: New Feature
Affects Versions: 1.0.0
FeedParser empty links for items
Key: NUTCH-583
URL: https://issues.apache.org/jira/browse/NUTCH-583
Project: Nutch
Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Enis Soztutar
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543067
]
Enis Soztutar commented on NUTCH-573:
-
So, how shall we proceed with this one?
I give +1 to commit this, and deal
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542449
]
Enis Soztutar commented on NUTCH-573:
-
@Andrzej
I recall google over comma delimited syntax, but now it doesn't wo
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542389
]
Enis Soztutar commented on NUTCH-573:
-
bq. Using commas is IMHO not intuitive
With your respect I should disagree
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-573:
Fix Version/s: 1.0.0
Priority: Major (was: Minor)
Affects Version/s: (was:
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-573:
Attachment: multiTermQuery_v1.patch
Here is a patch that enables querying multiple values for the sa
[
https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar reassigned NUTCH-573:
---
Assignee: Enis Soztutar
> Multiple Domains - Query Search
> ---
>
[
https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541631
]
Enis Soztutar commented on NUTCH-574:
-
bq. Is this the type of process you were talking about with selecting most
[
https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541359
]
Enis Soztutar commented on NUTCH-574:
-
Honestly, i don't think not indexing anchor words that do not appear in the
[
https://issues.apache.org/jira/browse/NUTCH-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541326
]
Enis Soztutar commented on NUTCH-574:
-
Why don't you just refactor indexing anchor code into another plugin, say
[
https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537954
]
Enis Soztutar commented on NUTCH-442:
-
Due to the method signature bug
(http://bugs.sun.com/bugdatabase/view_bug.
[
https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534869
]
Enis Soztutar commented on NUTCH-442:
-
Using nutch with solr has been a very demanding request, so it will be very
[
https://issues.apache.org/jira/browse/NUTCH-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530656
]
Enis Soztutar commented on NUTCH-558:
-
I wonder why you do not use URLUtils introduced in NUTCH-439. Also there is
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521033
]
Enis Soztutar commented on NUTCH-439:
-
Recently Matt Cutts have written about parts of the urls :
http://www.matt
Index url field untokenized
---
Key: NUTCH-541
URL: https://issues.apache.org/jira/browse/NUTCH-541
Project: Nutch
Issue Type: New Feature
Components: indexer, searcher
Affects Versions: 1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515987
]
Enis Soztutar commented on NUTCH-439:
-
By the way, Andrzej could you please enable support for wiki style editing
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v2.3.patch
bq. TLDScoringFilter contains a misspelled field, tldEnties, it sh
[
https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513826
]
Enis Soztutar commented on NUTCH-518:
-
> I think removing initial score arguments and merging scores in
> Scoring
[
https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513819
]
Enis Soztutar commented on NUTCH-518:
-
Since there is no ordering among scoring filters, if we do something specif
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v2.2.patch
This patch includes "core" domain utilities and the tld plugin, bu
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513482
]
Enis Soztutar commented on NUTCH-439:
-
As for Doğacan's comments I've opened issues NUTCH-518 and NUTCH-517.
> T
[
https://issues.apache.org/jira/browse/NUTCH-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-518:
Attachment: opicScoring.chain.patch
Patch is attached, which was formerly a part of the patch in NUT
Fix OpicScoringFilter to respect scoring filter chaining
Key: NUTCH-518
URL: https://issues.apache.org/jira/browse/NUTCH-518
Project: Nutch
Issue Type: Bug
Components: indexe
[
https://issues.apache.org/jira/browse/NUTCH-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-517:
Attachment: build.encoding.patch
Patch for UTF-8 is attached
> build encoding should be UTF-8
> ---
build encoding should be UTF-8
--
Key: NUTCH-517
URL: https://issues.apache.org/jira/browse/NUTCH-517
Project: Nutch
Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Enis Soztutar
Fix
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v2.1.patch
Oops, it seems that i've uploaded the wrong file. This is the corr
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: (was: domain.suffixes_v2.1.patch)
> Top Level Domains Indexing / Scoring
> -
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: domain.suffixes_v2.1.patch
> Very nice patch!
Thanks !
> IP_PATTERN - it could be tight
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v2.0.patch
I have made major improvements to the code and configuration files
[
https://issues.apache.org/jira/browse/NUTCH-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511121
]
Enis Soztutar commented on NUTCH-508:
-
Tasktracker invokes another jvm calling TaskTracker$Child but hadoop.log.di
[
https://issues.apache.org/jira/browse/NUTCH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511043
]
Enis Soztutar edited comment on NUTCH-510 at 7/9/07 5:32 AM:
-
Attached patch deletes worki
[
https://issues.apache.org/jira/browse/NUTCH-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-510:
Attachment: index.merger.delete.temp.dirs.patch
Attached patch deletes working dirs on finally claus
IndexMerger delete working dir
--
Key: NUTCH-510
URL: https://issues.apache.org/jira/browse/NUTCH-510
Project: Nutch
Issue Type: Improvement
Components: indexer
Affects Versions: 1.0.0
Re
Implement a different caching mechanism for objects
cached in configuration
In-Reply-To: <[EMAIL PROTECTED]>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
[ https://issues.apache.org/jira/browse/NUTCH-501?page=3Dcom.atlassian.=
jira.plu
[
https://issues.apache.org/jira/browse/NUTCH-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505079
]
Enis Soztutar commented on NUTCH-498:
-
I think you may not want
{code}
reporter.incrCounter(Counters.COMBINED, c
[
https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-471:
Attachment: NutchBeanCreationSync_v2.patch
>From http://www-128.ibm.com/developerworks/java/library/
[
https://issues.apache.org/jira/browse/NUTCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491882
]
Enis Soztutar commented on NUTCH-475:
-
we can use a formula like :
delay = alpha * delay + (1 - alpha) * (k * t)
[
https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491313
]
Enis Soztutar commented on NUTCH-471:
-
> Nice trick with the unsynchronized check. :)
Wow, indeed i have used a pa
[
https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-471:
Attachment: NutchBeanCreationSync_v1.patch
this patch synchronizes NutchBean.get((ServletContext app
Fix synchronization in NutchBean creation
-
Key: NUTCH-471
URL: https://issues.apache.org/jira/browse/NUTCH-471
Project: Nutch
Issue Type: Bug
Components: searcher
Affects Versions: 1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485996
]
Enis Soztutar commented on NUTCH-466:
-
>> There may be many parts that use the same key/value classes in MapFiles.
[
https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12485977
]
Enis Soztutar commented on NUTCH-466:
-
This patch will indeed resolve many issues related to storing extra informa
[
https://issues.apache.org/jira/browse/NUTCH-464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484447
]
Enis Soztutar commented on NUTCH-464:
-
Opening an issue to ask for help is not a good practice. you should instead
[
https://issues.apache.org/jira/browse/NUTCH-455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479262
]
Enis Soztutar commented on NUTCH-455:
-
(from LUCENE-252)
In nutch we have 3 options : 1st is to disallow deleting
[
https://issues.apache.org/jira/browse/NUTCH-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-455:
Attachment: IndexSearcherCacheWarm.patch
the patch to the IndexSearcher is attached
> dedup on toke
dedup on tokenized fields is faulty
---
Key: NUTCH-455
URL: https://issues.apache.org/jira/browse/NUTCH-455
Project: Nutch
Issue Type: Bug
Components: searcher
Affects Versions: 0.9.0
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-445:
Attachment: index_query_domain_v1.2.patch
This patch is an update of the previous three patches.
Th
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12476550
]
Enis Soztutar commented on NUTCH-445:
-
Well, indeed the current two patches TranslatingRawFieldQueryFilter_v1.0.p
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-445:
Attachment: index_query_domain_v1.1.patch
This patch fixes the raw field name bug in v1.0 and adds t
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-445:
Attachment: TranslatingRawFieldQueryFilter_v1.0.patch
This patch complements index_query_domain_v1.0
[
https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-445:
Attachment: index_query_domain_v1.0.patch
Patch for index-domain and query-domain plugins.
> Domai
Domain İndexing / Query Filter
--
Key: NUTCH-445
URL: https://issues.apache.org/jira/browse/NUTCH-445
Project: Nutch
Issue Type: New Feature
Components: indexer, searcher
Affects Versions: 0.9.0
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v1.1.patch
I have forgotten to unset http.agent.name in the v1.0 accidentally
[
https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar updated NUTCH-439:
Attachment: tld_plugin_v1.0.patch
This is a plugin implementation for indexing and scoring top level
Top Level Domains Indexing / Scoring
Key: NUTCH-439
URL: https://issues.apache.org/jira/browse/NUTCH-439
Project: Nutch
Issue Type: New Feature
Components: indexer
Affects Versions: 0.9.0
[ http://issues.apache.org/jira/browse/NUTCH-251?page=all ]
Enis Soztutar updated NUTCH-251:
Attachment: Nutch-251-AdminGUI.tar.gz
I have updated the patch written by stephan.
This version works with Nutch-0.9-dev and hadoop-0.7.1 (current version of
nu
[ http://issues.apache.org/jira/browse/NUTCH-289?page=all ]
Enis Soztutar updated NUTCH-289:
Attachment: ipInCrawlDatumDraftV5.1.patch
The version 5 patch does not run on the current build. So i have fixed it and
resend the patch(did not changed any cod
[
http://issues.apache.org/jira/browse/NUTCH-393?page=comments#action_12447787 ]
Enis Soztutar commented on NUTCH-393:
-
Also IndexingException is catched by the Indexer, in which case the whole
document is not added to the writer (the funct
[ http://issues.apache.org/jira/browse/NUTCH-389?page=all ]
Enis Soztutar updated NUTCH-389:
Attachment: urlTokenizer-improved.diff
This is an improvement and a minor bug fix over the previous url tokenizer.
This version first replaces characters, which
[
http://issues.apache.org/jira/browse/NUTCH-389?page=comments#action_12445512 ]
Enis Soztutar commented on NUTCH-389:
-
Otis you can test the tokenizer using the TestUrlTokenizer junit test case. And
you cab test the NutchDocumentTokenizer b
[ http://issues.apache.org/jira/browse/NUTCH-389?page=all ]
Enis Soztutar updated NUTCH-389:
Description:
NutchAnalysis.jj tokenizes the input by threating & and _ as non token
seperators, which is in the case of the urls not appropriate. So i have writ
[ http://issues.apache.org/jira/browse/NUTCH-389?page=all ]
Enis Soztutar updated NUTCH-389:
Attachment: urlTokenizer.diff
patch for url tokenization
> a url tokenizer implementation for tokenizing index fields : url and host
> -
a url tokenizer implementation for tokenizing index fields : url and host
--
Key: NUTCH-389
URL: http://issues.apache.org/jira/browse/NUTCH-389
Project: Nutch
Issue Typ
[
http://issues.apache.org/jira/browse/NUTCH-356?page=comments#action_12431548 ]
Enis Soztutar commented on NUTCH-356:
-
I observed strange behaviour, when one of the plug-ins could not be included.
For example the plugin system fails to load
79 matches
Mail list logo