GitHub user lukovnikov opened a pull request:
https://github.com/apache/spark/pull/4650
RDF Loader added + documentation
Have been testing it with DBpedia dumps, works well so far.
Any help with custom partitioning and optimization is welcome.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/lukovnikov/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4650.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4650
----
commit 10436d252ad4876d28c91c77036e3d993050438a
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T19:41:58Z
fast forward from upstream
commit 595aed098fb423514b73263f96dfcaf1edbc72f5
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T21:41:00Z
dictionary builder done
commit c2399023825e804476527f7e159b182a1b5c91c8
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T21:44:07Z
[SPARK 5280]
commit f14e4835cf365fcbe5dd0979e61464b7cecb8774
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T22:50:06Z
done dictionary version
commit 43cc53ab6d99a4a96a0764cc306f38fdce3a7e00
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T23:25:07Z
[SPARK 5280] rdfloader using hashes as VertexIds
commit 2e1220d0938aee7d190439253e3b9bb1e73c77e8
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T00:04:48Z
cleaned up + fixed style
TODO: test + comment
commit 54e2c6eb24dade70753320a3ab2b3a64fef7a6d4
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T00:26:30Z
made custom 64bit hash
commit b454560508c9d50c60e067d7e67405ca1e13c165
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T00:32:57Z
proper
commit 45a9f57695e76c09c20fa99a1010168f63ef1da8
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T19:41:58Z
fast forward from upstream
commit 6ee9a2b675d06675b5b591f16e8d52e63d2dc049
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T21:41:00Z
dictionary builder done
commit 45c22160c52111066109f57a0d773aca211c2068
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T21:44:07Z
[SPARK 5280]
commit fa5c0da9ea4f6ca662406b380432901022d6de55
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T22:50:06Z
done dictionary version
commit c036f98476e96ac03124f758ed7f17c4a464cf86
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T23:25:07Z
[SPARK 5280] rdfloader using hashes as VertexIds
commit 57553797f7404e686674b0bfb39d80bb24d6520c
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T00:04:48Z
cleaned up + fixed style
TODO: test + comment
commit e00123eae4a84108af2c84cf253b1f4fb1fb69f1
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T00:26:30Z
made custom 64bit hash
commit 6af9a7ad6198174597ae7d86ec5c15fc8467a082
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T00:32:57Z
proper
commit 1ee34c9474bcf4500edecb08a848d15f3549055d
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T03:31:05Z
Merge branch 'master' of github.com:lukovnikov/spark into rdfloaderhash
commit 9000a4713d286d5078c16f62b5fadf480941bc82
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T03:31:18Z
Merge branch 'rdfloaderhash' of github.com:lukovnikov/spark into
rdfloaderhash
commit 70eb725a102ae711a59c6d45794d191c18778c4b
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T23:02:48Z
RDF Loader with hash, tested on small RDF dumps (more tests in progress)
commit 4398d93712777442ba0f2e8920423fcdd7b67d1f
Author: Denis <[email protected]>
Date: 2015-02-04T23:27:01Z
added documentation for RDFLoader
commit 273a1b30dee1630333e0f7e683378b6dbb13c3a5
Author: Denis <[email protected]>
Date: 2015-02-04T23:29:05Z
small update to RDFLoader description
commit 202ccf86901c3d2435564e544f90d6a49cda66fb
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T23:31:10Z
sdf
commit 2d990cec1d48f62f4f1d9f9cf8082308a4eaf9e4
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-03T19:41:58Z
fast forward from upstream
commit 4a9b6222176749bee4a14e4b6d035b665c6ac7ea
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T23:43:31Z
Merge branch 'master' of github.com:lukovnikov/spark
commit 062996c45d0443836c1b4b2bb714d8f459ea6980
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T23:43:52Z
Merge branch 'rdfloaderhash'
commit 121bf14140573349424e7888da13ee2e8ea4f6f0
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T23:45:48Z
[SPARK 5280]
commit 67ada514b98292ff647d8354545d37cc111499ba
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T23:47:21Z
Merge branch 'rdfloaderhash' of github.com:lukovnikov/spark into
rdfloaderhash
commit e5fcf758c0e4b54a38b2a01709681e11bbb6eae8
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T23:47:45Z
Merge branch 'rdfloaderhash'
commit c5960af7b14d65b1d290c3af11d722075a54ad2d
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-04T23:54:37Z
Merge remote-tracking branch 'upstream/master'
commit 91361f3f760dbc78467f8e2b87a1d77061aa59de
Author: lukovnikov <lukovnikov@denis>
Date: 2015-02-05T00:01:33Z
undone unnecessary changes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]