Immediate Hire: Data Analyst in Reston, VA for 3 months contract

yuvi recruiter Thu, 29 Aug 2013 14:45:23 -0700

Hello,

Please send resumes to [email protected] (or) Reach me at
408-604-9252 ASAP


*Job Description:*
*
*
*Title: Data Analyst—Sr *
*Location: Reston, VA*
*Pay rate: open*
*Duration: 3 months*
*
*
*Job Description:*

Responsible for data analysis, validation, cleansing, collection and
reporting. Extract and analyze data from various sources, including
databases, manual files, and external websites. Respond to data inquiries
from various groups within an organization. Create and publish regularly
scheduled and/or ad hoc reports as needed. Document reporting requirements
and process and validate data components as required. Experience with
relational databases and knowledge of query tools and/or statistical
software required.�Strong analytical and organizational skills required.
Must possess expert level knowledge of MS Excel. 6+ years of prior
experience as a Data Analyst is required.


Skills More than 7 years of deep involvement with SQL and data warehouses.
2+ years of experience with HIVE. Working familiarity with Hadoop clusters.
Goal The data warehouse is the most comprehensive set of domain-oriented
transactions covering most of Verisign’s operated TLDs. It consolidates
data from both Core as well as the Namestore system to provide a rich set
of data over all Verisign operated TLDs (dotNAME will be added in 2013Q3).
The data within the warehouse is currently utilized by a multitude of
external facing services such as IPS, Data Analyzer, WhoWas and internal
facing services and efforts such as Strategy and domain renewal
forecasting. The goal of this effort is to provide easy access to current
and relevant data fields currently stored within the Data Warehouse, in the
Compute Cluster environment. This effort will help existing products
utilize the additional data to extend or enhance their products and
services, and eliminate the need for additional one-off data transfer
scripts. Work We have determined that we will load into the Compute Cluster
environment a one time dump of a big table (~1B rows) of 30+ fields that
represent all the domain-related transactions that ever occurred; on a
daily basis a much smaller table (100k-200k rows) of the day’s transactions
will be ingested from the Data Warehouse to the Compute Cluster. The work
will cover the following 3 areas: 1. Propose and implement in coordination
with the Data Warehouse team the method for generating the one time dump
and the daily increments and ingesting them into the Compute Cluster.
Propose how the data should be stored in the Compute Cluster (e.g., use
partitions, append updates into a large file, etc.) 2. Implement HIVE
queries for the following user queries (will execute in the Compute
Cluster): • What domains aren’t currently but have been registered in the
past? • Which domains were registered between time X & Y? And passed the
AGP? • Which domains expired between time X & Y? • Which domains were
active in a particular TLD at time X? • What set of domains were previously
registered? • For a given domain, when was it last registered? • How many
different registrars has a given domain been registered by? Which ones? •
What are the temporal registration patterns of these (a list) domains? •
How many times has this domain renewed? Transferred? Deleted? • How many
total days has this domain been active? • What were the domains in the zone
for a specific date? Perhaps pre-calculate those for every date Additional
user queries might be provided for implementation as HIVE queries. If
possible, care should be taken to only use UDF’s that are also available in
IMPALA, as IMPALA might be used for faster query processing of the Data
Warehouse data in the Compute Cluster. One known complication is the
handling of time as a parameter for those queries in HIVE, given the
constraints of HIVE with handling time. One option is to convert time in
the Data Warehouse data into UNIX_EPOCH time, so that the HIVE queries can
convert user-inputted dates into UNIX_EPOCH time for execution against the
source data. Alternative recommendations will be appreciated. 3. Propose
and implement the optimal ways for users to interact with the Data
Warehouse data in the Compute Cluster, e.g., - pre-compute data sets that
will be heavily reused - create HIVE views for queries in (2) that users
can re-use with specific parameters - use additional partitions for the
source or pre-computed data

*----*
*Thank you,*
*Yuvi
Sr. Associate Talent Acquisition
Bartronics America*
*Ph: 408-604-9252*
*Email: [email protected]
Yahoo IM: yuvi.recruiter
Gtalk: yuvi1.recruiter*

-- 
You received this message because you are subscribed to the Google Groups 
"US_IT.Groups" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/us_itgroups.
For more options, visit https://groups.google.com/groups/opt_out.

Immediate Hire: Data Analyst in Reston, VA for 3 months contract

Reply via email to